pdf.js

Author	SHA1	Message	Date
Jonas Jenwald	e8562173b8	Prevent an infinite loop when parsing corrupt /CCITTFaxDecode data (issue 14305) Fixes one of the documents in issue 14305.	2021-12-07 13:57:25 +01:00
Jonas Jenwald	40291d1943	Handle errors when fetching the raw /Metadata (issue 14305) Currently the `Catalog.metadata` getter only handles errors during parsing, however in a corrupt PDF document fetching of the raw /Metadata can obviously fail as well. Without this patch the `PDFDocumentProxy.getMetadata` method, in the API, can thus fail which it never should and this will cause the viewer to not initialize all state as expected. Fixes one of the documents in issue 14305.	2021-12-04 09:41:42 +01:00
Jonas Jenwald	1fac6371d3	[Regression] Eagerly fetch/parse the entire /Pages-tree in corrupt documents (issue 14303, PR 14311 follow-up) Please note: This is similar to the method that existed prior to PR 3848, but the new method will only be used as a fallback when parsing of corrupt PDF documents. The implementation in PR 14311 unfortunately turned out to be way too simplistic, as evident by the recently added test-files in issue 14303, since it may cause infinite loops in `PDFDocument.checkLastPage` for some corrupt PDF documents.[1] To avoid this, the easiest solution that I could come up with was to fallback to eagerly parsing the entire /Pages-tree when the /Count-entry validation fails during document initialization. Fixes at least two of the issues listed in issue 14303, namely the `poppler-395-0.pdf...` and `GHOSTSCRIPT-698804-1.pdf...` documents. --- [1] The whole point of PR 14311 was obviously to get rid of infinte loops during document initialization, not to introduce any more of those.	2021-12-02 14:31:04 +01:00
Jonas Jenwald	63be23f05b	Handle errors correctly when data lookup fails during /Pages-tree parsing (issue 14303) This only applies to severely corrupt documents, where it's possible that the `Parser` throws when we try to access e.g. a /Kids-entry in the /Pages-tree. Fixes two of the issues listed in issue 14303, namely the `poppler-742-0.pdf...` and `poppler-937-0.pdf...` documents.	2021-12-02 10:54:40 +01:00
Jonas Jenwald	a807ffe907	Prevent circular references in XRef tables from hanging the worker-thread (issue 14303) Please note: While this patch on its own is sufficient to prevent the worker-thread from hanging, however in combination with PR 14311 these PDF documents will both load and render correctly. Rather than focusing on the particular structure of these PDF documents, it seemed (at least to me) to make sense to try and prevent all circular references when fetching/looking-up data using the XRef table. To avoid a solution that required tracking the references manually everywhere, the implementation settled on here instead handles that internally in the `XRef.fetch`-method. This should work, since that method and the `Parser`/`Lexer`-implementations are completely synchronous. Note also that the existing `XRef`-caching, used for all data-types except Streams, should hopefully help to lessen the performance impact of these changes. One potential problem with these changes could be certain browser exceptions, since those are generally not catchable in JavaScript code, however those would most likely "stop" worker-thread parsing anyway (at least I hope so). Finally, note that I settled on returning dummy-data rather than throwing an exception. This was done to allow parsing, for the rest of the document, to continue such that one bad reference doesn't prevent an entire document from loading. Fixes two of the issues listed in issue 14303, namely the `poppler-91414-0.zip-2.gz-53.pdf` and `poppler-91414-0.zip-2.gz-54.pdf` documents.	2021-11-27 23:50:26 +01:00
Jonas Jenwald	d0c4bbd828	[api-minor] Validate the /Pages-tree /Count entry during document initialization (issue 14303) This patch basically extends the approach from PR 10392, by also checking the last page. Currently, in e.g. the `Catalog.numPages`-getter, we're simply assuming that if the /Pages-tree has an integer /Count entry it must also be correct/valid. As can be seen in the referenced PDF documents, that entry may be completely bogus which causes general parsing to breaking down elsewhere in the worker-thread (and hanging the browser). Rather than hoping that the /Count entry is correct, similar to all other data found in PDF documents, we obviously need to validate it. This turns out to be a little less straightforward than one would like, since the only way to do this (as far as I know) is to parse the entire /Pages-tree and essentially counting the pages. To avoid doing that for all documents, this patch tries to take a short-cut by checking if the last page (based on the /Count entry) can be successfully fetched. If so, we assume that the /Count entry is correct and use it as-is, otherwise we'll iterate through (potentially) the entire /Pages-tree to determine the number of pages. Unfortunately these changes will have a number of somewhat negative side-effects, please see a possibly incomplete list below, however I cannot see a better way to address this bug. - This will slow down initial loading/rendering of all documents, at least by some amount, since we now need to fetch/parse more of the /Pages-tree in order to be able to access the last page of the PDF documents. - For poorly generated PDF documents, where the entire /Pages-tree only has one level, we'll unfortunately need to fetch/parse the entire /Pages-tree to get to the last page. While there's a cache to help reduce repeated data lookups, this will affect initial loading/rendering of some long PDF documents, - This will affect the `disableAutoFetch = true` mode negatively, since we now need to fetch/parse more data during document initialization. While the `disableAutoFetch = true` mode should still be helpful in larger/longer PDF documents, for smaller ones the effect/usefulness may unfortunately be lost. As one small additional bonus, we should now also be able to support opening PDF documents where the /Pages-tree /Count entry is completely invalid (e.g. contains a non-integer value). Fixes two of the issues listed in issue 14303, namely the `poppler-67295-0.pdf` and `poppler-85140-0.pdf` documents.	2021-11-27 21:57:35 +01:00
Calixte Denizet	31e13515f5	XFA - Draw arcs correctly - it aims to fix #14315; - take into account the startAngle to compute the coordinates of the final point.	2021-11-27 19:30:12 +01:00
Jonas Jenwald	ca8d2bdce4	Abort parsing when the XRef /W-array contain bogus entries (issue 14303) For this particular PDF document, we have `/W [1 2 166666666666666666666666666]` which obviously makes no sense. While this patch makes no attempt at actually validating the entries in the /W-array, we'll now simply abort all processing when the end of the PDF document has been reached (thus preventing hanging the browser). Please note that this patch doesn't enable the PDF document to be loaded/rendered, but at least it fails "correctly" now. Fixes one of the issues listed in issue 14303, namely the `REDHAT-1531897-0.pdf`document.	2021-11-25 18:35:08 +01:00
Jonas Jenwald	ae4f1ae3e7	Ensure that `ChunkedStream` won't attempt to request data beyond the document size (issue 14303) This bug was surprisingly difficult to track down, since it didn't just depend on range-requests being used but also on how quickly the document was loaded. To even be able to reproduce this locally, I had to use a very small `rangeChunkSize`-value (note the unit-test). The cause of this bug is a bogus entry in the XRef-table, causing us to attempt to request data from beyond the actual document size and thus getting into an infinite loop. Fixes one of the issues listed in issue 14303, namely the `PDFBOX-4352-0.pdf` document.	2021-11-24 19:19:43 +01:00
Calixte Denizet	7041c62ccf	Remove non-displayable chars from outline title (#14267 ) - it aims to fix #14267; - there is nothing about chars in range [0-1F] in the specs but acrobat doesn't display them in any way.	2021-11-13 16:56:08 +01:00
Jonas Jenwald	ea1c348c67	Always prefer abbreviated keys, over full ones, when doing any dictionary lookups (issue 14256) Note that issue 14256 was specifically about inline images, please refer to: - https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf#G7.1852045 - https://www.pdfa.org/safedocs-unearths-pdf-inline-image-issue/ - https://pdf-issues.pdfa.org/32000-2-2020/clause08.html#H8.9.7 However, during review of the initial PR in https://github.com/mozilla/pdf.js/pull/14257#issuecomment-964469710, it was suggested that we instead do this unconditionally for all dictionary lookups. In addition to re-ordering the existing call-sites in the `src/core`-code, and adding non-PRODUCTION/TESTING asserts to catch future errors, for consistency a number of existing `if`/`switch`-blocks were re-factored to also check the abbreviated keys first.	2021-11-10 11:56:18 +01:00
calixteman	e136afbabc	Merge pull request #14218 from janekotovich/subform_min_0 XFA subform with occur min=0 and no bound data displaying.	2021-11-05 04:12:34 -07:00
Jonas Jenwald	8222d6530b	Merge pull request #14232 from brendandahl/show-text-pattern Use correct matrix for patterns with showText.	2021-11-05 10:04:56 +01:00
Brendan Dahl	1c7048399b	Use correct matrix for patterns with showText. We were incorrectly using the transform in the pattern before it had been adjusted causing the pattern to be misplaced relative to the page. Fixes: ShowText-ShadingPattern.pdf (already in corpus) Fixes: #8111 Fixes: #9243	2021-11-04 16:57:36 -07:00
Jane-Kotovich	56b502391c	XFA subform with occur min=0 and no bound data displaying Subfrom nomin displays even though it's subform is set to <occur max=-1 min=0> If we look through specs of XFA 3.3 : https://www.pdfa.org/norm-refs/XFA-3_3.pdf - The min attribute is used when processing a form that contains data. Regardless of the data at least this number of instances is included. It is permissible to set this value to zero, in which case the container is entirely excluded if there is no data for it. However, in our case it doesn't happen, because we let our empty dataNode get through. Though by setting a clause: - eliminate unmatched data with occur min=0 we are checking our empty data and sending it to uselessNode array where at the end it gets removed;	2021-11-04 20:22:05 +10:00
Jonas Jenwald	5f77d3719b	Tweak the Bidi-detection heuristics for very short RTL strings (issue 11656) Very short strings can narrowly miss the existing Bidi-detection threshold, leading to incorrect text-selection and copying behaviour. In my testing, neither Adobe Reader or PDFium seem to handle copying "correctly" for this document. Hence it's not entirely clear to me that we actually want to fix this, since tweaking these heuristics can obviously cause regressions elsewhere (and our test coverage for RTL-text isn't exactly great).	2021-11-03 20:31:57 +01:00
Jonas Jenwald	8edec018fe	Add a RTL-text reference test (issue 10301) It seems that issue 10301 was fixed by PR 13424, by combining the spans, however given that we don't have a lot of test coverage for RTL-text I figured that adding a simple reference test wouldn't hurt (rather than just closing the issue as WORKSFORME).	2021-10-31 16:55:11 +01:00
Calixte Denizet	cf8dc750d6	Support rich content in markup annotation - use the xfa parser but in the xhtml namespace.	2021-10-31 13:44:51 +01:00
Tim van der Meij	0e7614df7f	Merge pull request #14180 from Snuffleupagus/bug-1627427 Handle ranges that "overflow" the last byte in `CMap.mapBfRange` (bug 1627427)	2021-10-27 20:06:09 +02:00
Jane-Kotovich	91fc643ff9	[api-minor] Implement securityHandler in the scripting API (bug 1731578)	2021-10-26 23:42:04 +10:00
Jonas Jenwald	aa1b78684f	Handle ranges that "overflow" the last byte in `CMap.mapBfRange` (bug 1627427)	2021-10-24 13:48:38 +02:00
Jonas Jenwald	52372b9378	Merge pull request #14175 from brendandahl/smask-v2 Use a new method for handling soft masks.	2021-10-23 09:27:18 +02:00
Brendan Dahl	2d1f9ff7a3	Use a new method for handling soft masks. The old method of handling soft masks had a number of issues where the temporary drawing canvas and the suspended main canvas could get out of sync (e.g. mismatched save/restores or clip state) or we could end up compositing at the wrong time. A good example of things getting out sync is the reduced test case in #9017. To fix this I've changed two big things: 1) Duplicate all the needed graphics state from the temporary canvas to the suspended main canvas. This ensure the canvases stay in sync so that when we switch back to the main canvas the graphics state stack is the same (e.g. transforms, clip paths). 2) Immediately composite after each drawing operation. This ensures that if there's an active clip region that we'll still be able to composite the correct portions of the canvas. Note: This solution could be avoided by using getImageData and putImageData since those ignore clipping region, but this is very very slow. Note2: I also think the old way of only compositing at the end of the soft mask is incorrect and can lead to wrong colors if drawing over the same region, but in practice this doesn't seem to matter much. Fixes: #5781 Fixes: #5853 Fixes: #7267 Fixes: #7891 Fixes: #8403 Fixes: #8624 Fixes: #12798 Fixes: #13891 Fixes: #9017 (reduced test case) Fixes: https://bugzilla.mozilla.org/show_bug.cgi?id=1703683	2021-10-22 13:41:21 -07:00
calixteman	bbb64369f1	Merge pull request #13424 from calixteman/chunks2 [api-minor] Fix issues in text selection	2021-10-18 06:14:15 -07:00
Calixte Denizet	61d1063276	Fix issues in text selection - PR #13257 fixed a lot of issues but not all and this patch aims to fix almost all remaining issues. - the idea in this new patch is to compare position of new glyph with the last position where a glyph has been drawn; - no space are "drawn": it just moves the cursor but they aren't added in the chunk; - so this way a space followed by a cursor move can be treated as only one space: it helps to merge all spaces into one. - to make difference between real spaces and tracking ones, we used a factor of the space width (from the font) - it was a pretty good idea in general but it fails with some fonts where space was too big: - in Poppler, they're using a factor of the font size: this is an excellent idea (<= 0.1 * fontSize implies tracking space).	2021-10-17 16:27:05 +02:00
Jay Berkenbilt	586295fad6	Implement TrueType character map "format 2" (fixes #14117 ) If a PDF included an embedded TrueType font whose preferred character map (cmap) was in "format 2", the code would select that character map and then refuse to read it because of an unsupported format, thus causing the characters not to be rendered. This commit implements support for format 2 as described at the link below. https://developer.apple.com/fonts/TrueType-Reference-Manual/RM06/Chap6cmap.html	2021-10-13 07:37:14 -04:00
Jonas Jenwald	284d259054	Merge pull request #14057 from Snuffleupagus/bug-920426 Support CMap-data with only strings, when parsing TrueType composite fonts (bug 920426)	2021-10-01 23:22:25 +02:00
Calixte Denizet	aecbd7cd89	AcroForm: Add support for ResetForm action - it aims to fix #12721. - Thanks to PR #14023, we've now the fieldObjects in the annotation layer so we can easily map fields names on their id if needed. - Reset values in the storage, in the JS sandbox and in the visible html elements.	2021-09-30 22:02:33 +02:00
Jonas Jenwald	d3ca28bc34	Support CMap-data with only strings, when parsing TrueType composite fonts (bug 920426) In the referenced bug, the embedded fonts contain custom CMap-data that only include strings. Note how for embedded composite TrueType fonts we're using the CMap-data when building the glyph mapping, and currently we end up with a completely empty map because the code expects only CID numbers. Furthermore, just fixing the glyph mapping alone isn't sufficient to fully address the bug, since we also need to consider this "special" kind of CMap-data when looking up glyph widths.	2021-09-30 18:10:47 +02:00
Calixte Denizet	748ab4983c	Add the missing pdf file for the test in the PR #14049	2021-09-29 22:07:07 +02:00
Jonas Jenwald	1dcd2f0cd3	[api-minor] Add basic support for RTL text-content in PopupAnnotations (issue 14046) In order to implement this, we utilize the existing `bidi` function to infer the text-direction of /T and /Contents entries. While this may not be perfect in cases where one PopupAnnotation mixes LTR and RTL languages, it should work well enough in most cases. To avoid having to add two new properties in lots of annotations, supplementing the existing `title`/`contents`-properties, this patch instead re-factors the existing code such that the properties are replaced by Objects (containing `str` and `dir`). Please note: In order avoid breaking existing third-party implementations, `GENERIC`-builds of the PDF.js library will still provide the old `title`/`contents`-properties on annotations returned by `PDFPageProxy.getAnnotations`.	2021-09-25 09:18:58 +02:00
Calixte Denizet	386acf5bdd	Integration test for PR #14023	2021-09-23 13:05:18 +02:00
Jonas Jenwald	8ea27ce157	Tweak how fonts with an /Encoding are handled in `adjustToUnicode` (issue 14048, PR 13277 follow-up) Currently we only exclude /Encoding entries that also contains a /Differences array, which is the cause of the text-selection problem in the referenced issue. In order to address this we'll now also exclude /Encoding entries that contain one of the predefined named encodings, and no longer require that it also contains a /Differences array. Please note: This patch cases a small "regression" in the `bug1130815-text` test-case, however this is actually an improvement when compared with Adobe Reader and PDFium (in Google Chrome).	2021-09-18 22:44:25 +02:00
Jonas Jenwald	a11343e9af	Improve glyph mapping for non-embedded composite standard fonts with a /CIDToGIDMap (issue 11915) Please note: All of this feels very handwavy, but at least it passes all tests locally. Hopefully we have enough tests for this part of the font code. For non-embedded composite standard fonts with an "incomplete" /CIDToGIDMap, we'll now fallback to an explicitly defined /ToUnicode map even when that one happens to be an /Identity-H or /Identity-V map. The `Font.fallbackToSystemFont` method is unfortunately getting more and more special-cases, however that might be unavoidable given all the weird non-embedded fonts found in the wild :-(	2021-09-15 11:30:40 +02:00
Brendan Dahl	f38fb42b42	Enable/disable image smoothing based on image interpolate value. (bug 1722191) While some of the output looks worse to my eye, this behavior more closely matches what I see when I open the PDFs in Adobe acrobat. Fixes: #4706, #9713, #8245, #1344	2021-09-10 14:23:35 -07:00
Jonas Jenwald	3ccf277f58	Fallback to the /ToUnicode map for TrueType fonts with (3, 1) and (1, 0) cmap-tables (issue 13316) In the PDF document some of the glyphs have bogus `differences`-entries[1] that cannot be resolved to valid glyph names, thus causing the glyph mapping to fail. My initial idea was to use a similar approach as in the `PartialEvaluator._simpleFontToUnicode`-method, to extract the charCodes from those entries, however it turned out that that didn't actually help in this case (the mapping was still wrong). To fix this I'm thus proposing that we fallback to the /ToUnicode map when no other useable data exists (e.g. no post-table), since it hopefully shouldn't make things any worse than leaving parts of the glyph map empty (which currently happens). --- [1] As can be seem below, some of the entries are completely normal while others are non-standard: ``` Differences (array) 0 = 65 1 = /g5167 2 = /space 3 = /g11927 4 = /g17737 5 = /g11540 6 = /g2180 7 = /K 8 = /P 9 = /two 10 = /zero 11 = /one 12 = /five 13 = /four 14 = /g6932 15 = /g7246 16 = /g1691 17 = /g2343 18 = /g14792 19 = /g3325 20 = /g4280 21 = /g20383 22 = /g18166 23 = /g16988 24 = /g17943 25 = /g19223 26 = /g10830 27 = 97 28 = /g982 29 = /g1226 30 = /g5059 31 = /g2677 32 = /g1042 33 = /g11568 34 = /L 35 = /three 36 = /seven 37 = /g2364 38 = /g12063 39 = /g5356 40 = /g2173 41 = /g17877 42 = /g7273 43 = /g7647 44 = /g7224 45 = /g19327 46 = /g5054 47 = /g2342 48 = /g10136 49 = /g6856 50 = /g13381 51 = /g7257 52 = /g12093 53 = /g2359 ```	2021-09-04 07:38:22 +02:00
Brendan Dahl	a7f807b059	Only use base encoding if it's populated. (bug 1727053) The font dict in this file has an encoding entry, but only specifies a differences map. The base encoding is empty in this case and shouldn't be used.	2021-08-30 12:51:59 -07:00
Jonas Jenwald	853b1172a1	Support Optional Content in Image-/XObjects (issue 13931) Currently, in the `PartialEvaluator`, we only support Optional Content in Form-/XObjects. Hence this patch adds support for Image-/XObjects as well, which looks like a simple oversight in PR 12095 since the canvas-implementation already contains the necessary code to support this.	2021-08-26 16:54:15 +02:00
Jonas Jenwald	ac27f96987	Extend the glyph maps for standard respectively Calibri fonts (issue 13916)	2021-08-21 00:48:38 +02:00
Brendan Dahl	da1af02ac8	Improve performance of reused patterns. Bug 1721218 has a shading pattern that was used thousands of times. To improve performance of this PDF: - add a cache for patterns in the evaluator and only send the IR form once to the main thread (this also makes caching in canvas easier) - cache the created canvas radial/axial patterns - for shading fill radial/axial use the pattern directly instead of creating temporary canvas	2021-07-22 16:47:40 -07:00
Brendan Dahl	a52c0c6988	Fix transformations when painting image masks and tiling patterns. Previously, when we filled image masks we didn't copy over the current transformation, this caused patterns to be misaligned when painted. Now we create a temporary canvas with the mask and have the transform copied over and offset it relative to where the mask would be painted. We also weren't properly offsetting tiling patterns. This isn't usually noticeable since patters repeat, but in the case of #13561 the pattern is only drawn once and has to be in the correct position to line up with the mask image. These fixes broke #11473, but highlighted that we were drawing that correctly by accident and not correctly handling negative bounding boxes on tiling patterns. Fixes #6297, #13561, #13441 Partially fixes #1344 (still blurry but boxes are in correct position now)	2021-07-06 17:29:32 -07:00
Calixte Denizet	429ffdcd2f	XFA - Save filled data in the pdf when downloading the file (Bug 1716288) - when binding (after parsing) we get a map between some template nodes and some data nodes; - so set user data in input handlers in using data node uids in the annotation storage; - to save the form, just put the value we have in the storage in the correct data nodes, serialize the xml as a string and then write the string at the end of the pdf using src/core/writer.js; - fix few bugs around data bindings: - the "Off" issue in Bug 1716980.	2021-06-25 18:57:01 +02:00
Brendan Dahl	5efaaa0fea	Fix how patterns are applied to image mask objects. Note, this only really fixes Radial/Axial shading patterns with masks. I'm guessing tiling patterns and mesh patterns would also be broken if applied like the test pdf. Hopefully I'll have some time to make test cases for the other shadings. Fixes #13372	2021-06-16 20:06:41 -07:00
Jonas Jenwald	e7dc822e74	Merge pull request #12726 from brendandahl/standard-fonts [api-minor] Include and use the 14 standard font files.	2021-06-08 10:09:40 +02:00
Brendan Dahl	4c1dd47e65	Include and use the 14 standard fonts files.	2021-06-07 11:10:11 -07:00
Jonas Jenwald	20770cb06a	Improve text-selection for Type3 fonts with empty /FontBBox-entries (issue 6605) For Type3 fonts where the /CharProcs-streams of the individual glyph starts with a `d1` operator, we can use that to build a fallback bounding box for the font and thus improve text-selection in some cases.	2021-06-05 08:09:29 +02:00
Brendan Dahl	6255c2a8f3	Merge pull request #13376 from calixteman/6132 Replace command with not enough args by an endchar in CFF font	2021-06-04 14:00:51 -07:00
Tim van der Meij	a0ce3cb3b4	Merge pull request #13448 from Snuffleupagus/_setDefaultAppearance-alpha Support strokeAlpha/fillAlpha when creating a fallback appearance stream (issue 6810)	2021-05-28 23:39:36 +02:00
Jonas Jenwald	707a9e3b02	Work-around for HighlightAnnotations without a top-level /ExtGState-entry (issue 13242) For HighlightAnnotations with a built-in appearance stream, we still rely on it to specify the opacity correctly via a suitable blend mode. However, if the Annotation-drawing operators are placed within a /XObject of the /Form-type, the /ExtGState won't apply to the final rendering and the result is that the highlighting obscures the underlying text. The more correct and general solution would likely be to somehow modify the implementation in `src/display/canvas.js`, to special-case handling of /Form-type /XObjects when rendering Annotations. Since we can very easily work-around this problem for now by using the "no appearance stream" code-path, doing something here ought to be preferable. This patch is (obviously) merely a work-around, but given that the referenced issue is (as far as I know) the first case we've seen of this problem a simple solution will hopefully suffice for now.	2021-05-28 13:49:27 +02:00
Jonas Jenwald	a6447f2ca2	Support strokeAlpha/fillAlpha when creating a fallback appearance stream (issue 6810) This fixes the colours, by respecting the strokeAlpha/fillAlpha-values, for a couple of Annotations in the PDF document from issue 13447.[1] --- [1] Some of the annotations still won't render at all, when compared with Adobe Reader, but that could/should probably be handled separately.	2021-05-27 16:23:18 +02:00

1 2 3 4 5 ...

569 Commits