pdf.js

Author	SHA1	Message	Date
Tsukasa OI	96ba6afd47	Fix copying on supplementary plane characters pdf.js had a problem when copying characters on supplementary planes (0xPPXXXX where PP is nonzero). This is because certain methods of PartialEvaluator use classic String.fromCharCode instead of ES6's String.fromCodePoint. Despite the fact that readToUnicode method tried to parse out-of-UCS2 code points by parsing UTF-16BE, it was inadequate because String.fromCharCode only supports UCS-2 range of Unicode.	2019-02-10 18:14:53 +09:00
Tim van der Meij	e2701d5422	Merge pull request #10482 from janpe2/indexed-decode Implement Decode entry in Indexed images	2019-01-24 23:46:55 +01:00
Jonas Jenwald	41fbc71ef9	Ensure that `XRef.indexObjects` can handle object numbers with zero-padding (issue 10491) All objects in the PDF document follow this pattern: ``` 0000000001 0 obj << % Some content here... >> endobj 0000000002 0 obj << % More content here... endobj ```	2019-01-24 22:37:18 +01:00
Jani Pehkonen	26121177ab	Implement Decode entry in Indexed images	2019-01-22 22:51:04 +02:00
Jonas Jenwald	b531fc4106	Avoid truncating inline images, where the data and the "EI" marker is glued together (issue 10388) (#10436 ) Thanks to the excellent debugging done by @janpe2, this was easy to fix!	2019-01-12 20:31:23 +01:00
Jonas Jenwald	d4a3858ed5	Handle more cases of corrupt PDF files with missing 'endobj' operators, where the "obj" string is immediately followed by the dictionary (PR 9288 follow-up)	2019-01-10 17:55:28 +01:00
Brendan Dahl	32eace043b	Fix reading number of HTMX metrics. The length of the HHEA table can be incorrect, so it is better to read the number of metrics offset from beginning of table instead.	2019-01-04 15:13:13 -08:00
Brendan Dahl	e2686db49b	Merge pull request #10277 from janpe2/cff-stems Repair CFF fonts if stem hints are in wrong order	2019-01-03 10:30:43 -08:00
Jonas Jenwald	60bcce184e	Check that the first page can be successfully loaded, to try and ascertain the validity of the XRef table (issue 7496, issue 10326) For PDF documents with sufficiently broken XRef tables, it's usually quite obvious when you need to fallback to indexing the entire file. However, for certain kinds of corrupted PDF documents the XRef table will, for all intents and purposes, appear to be valid. It's not until you actually try to fetch various objects that things will start to break, which is the case in the referenced issues[1]. Since there's generally a real effort being in made PDF.js to load even corrupt PDF documents, this patch contains a suggested approach to attempt to do a bit more validation of the XRef table during the initial document loading phase. Here the choice is made to attempt to load the first page, as a basic sanity check of the validity of the XRef table. Please note that attempting to load a more-or-less arbitrarily chosen object without any context of what it's supposed to be isn't a very useful, which is why this particular choice was made. Obviously, just because the first page can be loaded successfully that doesn't guarantee that the entire XRef table is valid, however if even the first page fails to load you can be reasonably sure that the document is not valid[2]. Even though this patch won't cause any significant increase in the amount of parsing required during initial loading of the document[3], it will require loading of more data upfront which thus delays the initial `getDocument` call. Whether or not this is a problem depends very much on what you actually measure, please consider the following examples: ```javascript console.time('first'); getDocument(...).promise.then((pdfDocument) => { console.timeEnd('first'); }); console.time('second'); getDocument(...).promise.then((pdfDocument) => { pdfDocument.getPage(1).then((pdfPage) => { // Note: the API uses `pageNumber >= 1`, the Worker uses `pageIndex >= 0`. console.timeEnd('second'); }); }); ``` The first case is pretty much guaranteed to show a small regression, however the second case won't be affected at all since the Worker caches the result of `getPage` calls. Again, please remember that the second case is what matters for the standard PDF.js use-case which is why I'm hoping that this patch is deemed acceptable. --- [1] In issue 7496, the problem is that the document is edited without the XRef table being correctly updated. In issue 10326, the generator was sorting the XRef table according to the offsets rather than the objects. [2] The idea of checking the first page in particular came from the "standard" use-case for the PDF.js library, i.e. the default viewer, where a failure to load the first page basically means that nothing will work; note how `{BaseViewer, PDFThumbnailViewer}.setDocument` depends completely on being able to fetch the first page. [3] The only extra parsing is caused by, potentially, having to traverse part of the `Pages` tree to find the first page.	2018-12-29 12:47:25 +01:00
Jani Pehkonen	9e990f6f3e	Repair CFF fonts if stem hints are in wrong order	2018-11-20 18:50:37 +02:00
Simon Leblanc	b5806735d8	Add support of Ink annotation	2018-10-03 00:28:49 +02:00
Tim van der Meij	66422eb83e	Merge pull request #9340 from brendandahl/private-use Map all glyphs to the private use area and duplicate the first glyph.	2018-09-08 17:51:04 +02:00
Brendan Dahl	b76cf665ec	Map all glyphs to the private use area and duplicate the first glyph. There have been lots of problems with trying to map glyphs to their unicode values. It's more reliable to just use the private use areas so the browser's font renderer doesn't mess with the glyphs. Using the private use area for all glyphs did highlight other issues that this patch also had to fix: * small private use area - Previously, only the BMP private use area was used which can't map many glyphs. Now, the (much bigger) PUP 16 area can also be used. * glyph zero not shown - Browsers will not use the glyph from a font if it is glyph id = 0. This issue was less prevalent when we mapped to unicode values since the fallback font would be used. However, when using the private use area, the glyph would not be drawn at all. This is illustrated in one of the current test cases (issue #8234) where there's an "ä" glyph at position zero. The PDF looked like it rendered correctly, but it was actually not using the glyph from the font. To properly show the first glyph it is always duplicated and appended to the glyphs and the maps are adjusted. * supplementary characters - The private use area PUP 16 is 4 bytes, so String.fromCodePoint must be used where we previously used String.fromCharCode. This is actually an issue that should have been fixed regardless of this patch. * charset - Freetype fails to load fonts when the charset size doesn't match number of glyphs in the font. We now write out a fake charset with the correct length. This also brought up the issue that glyphs with seac/endchar should only ever write a standard charset, but we now write a custom one. To get around this the seac analysis is permanently enabled so those glyphs are instead always drawn as two glyphs.	2018-09-05 14:04:54 -07:00
Jonas Jenwald	e5a6d892b4	Revert "Attempt to combine separate beginText/endText sequences in `getTextContent` (issue 9984)"	2018-09-05 18:01:33 +02:00
Tim van der Meij	c94df0fef3	Merge pull request #9986 from Snuffleupagus/issue-9984 Attempt to combine separate beginText/endText sequences in `getTextContent` (issue 9984)	2018-09-01 21:21:29 +02:00
Jonas Jenwald	95e5bad4c4	Attempt to find truncated endstream commands, in the fallback code-path, in `Parser.makeStream` (issue 10004) Apparently there's some PDF generators, in this case the culprit is "Nooog Pdf Library / Nooog PStoPDF v1.5", that manage to mess up PDF creation enough that endstream[1] commands actually become truncated. Please note: The solution implemented here isn't perfect, since it won't be able to cope with PDF files that contains a mixture of correct and truncated endstream commands. However, considering that this particular mode of corruption fortunately doesn't seem very common[2], a slightly less complex solution ought to suffice for now. Fixes 10004. --- [1] Scanning through the PDF data to find endstream commands becomes necessary, in order to determine the stream length in cases where the `Length` entry of the (stream) dictionary is missing/incorrect. [2] I cannot recall having seen any (previous) issues/bugs with "Missing endstream" errors.	2018-08-26 11:51:11 +02:00
Jonas Jenwald	497b765ede	Attempt to combine separate beginText/endText sequences in `getTextContent` (issue 9984) Please note that while this improves issue 9984 slightly (and likely others too), it's not a complete solution. The remaining issues are related to the, more general, problems with the existing heuristics related to attempting to combine separate text items.	2018-08-18 13:45:32 +02:00
Brendan Dahl	5f67a6a237	Always fallback to system font on font failure. The font in the PDF is marked as a CIDFontType0, but the font file is actually a true type font. To fully address this issue we should really peek into the font file and try to determine what it is. However, this is the first case of this issue, so I think this solution is acceptable for now.	2018-08-03 16:49:22 -07:00
Brian	2a665ebad4	Removed Extraneous Matrix Check in CalRGB Conversion	2018-08-02 10:16:42 -07:00
Tim van der Meij	716acf63d4	Merge pull request #9938 from Snuffleupagus/issue-9915 Ensure that Type0, i.e. composite, OpenType fonts with `CFF ` tables are not treated as CFF fonts if their glyph mapping is non-default (issue 9915)	2018-08-02 00:11:18 +02:00
Jonas Jenwald	3ce420131f	Prefer the Width/Height of the image data, rather than the image dictionary, for JPEG 2000 images (issue 9650) According to the PDF specification, see https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#page=45 > When using the JPXDecode filter with image XObjects, the following changes to and constraints on some entries in the image dictionary shall apply (see 8.9.5, "Image Dictionaries" for details on these entries): > > - Width and Height shall match the corresponding width and height values in the JPEG2000 data. > > - . . . Hence it seems reasonable to use the Width/Height of the image data itself, rather than the image dictionary when there's a mismatch. Given that JPEG 2000 images are already being parsed, in order to obtain basic parameters, the actual Width/Height is readily available in the `PDFImage` constructor.	2018-08-01 16:42:26 +02:00
Jonas Jenwald	690bcc8c8a	Add a reduced, `eq`, test-case for issue 9915	2018-07-29 23:06:15 +02:00
Jonas Jenwald	2b25deb84c	Prevent errors in `sanitizeTTProgram`, during parsing of CALL functions, when encountering invalid functions stack deltas (bug 1473809) I was feeling bored; so this is a very quick, and somewhat naive, attempt at fixing the bug. The breaking error, i.e. `Error during font loading: invalid array length`, was thrown when attempting to re-size the `stack` to a negative length when parsing the CALL functions. Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1473809.	2018-07-10 09:45:55 +02:00
Jonas Jenwald	7f21e38787	Error, rather than warn, once a number of invalid path operators are encountered in `EvaluatorPreprocessor.read` (bug 1443140) Incomplete path operators, in particular, can result in fairly chaotic rendering artifacts, as can be observed on page four of the referenced PDF file. The initial (naive) solution that was attempted, was to simply throw a `FormatError` as soon as any invalid (i.e. too short) operator was found and rely on the existing `ignoreErrors` code-paths. However, doing so would have caused regressions in some files; see the existing `issue2391-1` test-case, which was promoted to an `eq` test to help prevent future bugs. Hence this patch, which adds special handling for invalid path operators since those may cause quite bad rendering artifacts. You could, in all fairness, argue that the patch is a handwavy solution and I wouldn't object. However, given that this only concerns corrupt PDF files, the way that PDF viewers (PDF.js included) try to gracefully deal with those could probably be described as a best-effort solution anyway. This patch also adjusts the existing `warn`/`info` messages to print the command name according to the PDF specification, rather than an internal PDF.js enumeration value. The former should be much more useful for debugging purposes. Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1443140.	2018-06-24 16:05:08 +02:00
Jonas Jenwald	56e3648b65	Add basic validation of the 'trailer' dictionary candidates in `XRef.indexObjects` (issue 9418) This patch avoids choosing a (possible) 'trailer' dictionary that `XRef.parse` and/or the `Catalog` constructor/methods will reject anyway. Since `XRef.indexObjects` is already parsing the entire PDF file, the extra dictionary look-ups added here shouldn't matter much. Besides, this is a fallback code-path that only applies to corrupt PDF files anyway.	2018-06-20 13:41:22 +02:00
Jonas Jenwald	6bbcafcd26	Let `Lexer.getNumber` treat a single decimal point as zero (issue 9252) This is consistent with the behaviour in Adobe Reader.	2018-06-20 13:41:21 +02:00
Jonas Jenwald	bf0db0fb72	Pass the `ignoreErrors` API option to the `FontFaceObject` constructor, and utilize it in `getPathGenerator` to ignore missing glyphs Obviously it's still not possible to render non-embedded fonts as paths, but in this way the rest of the page will at least be allowed to continue rendering. Please note: Including the 14 standard fonts in PDF.js probably wouldn't be that difficult to implement. (I'm not a lawyer, but the fonts from PDFium could probably be used given their BSD license.) However, the main blocker ought to be the total size of the necessary font data, since I cannot imagine people being OK with shipping ~5 MB of (additional) font data with Firefox. (Based on the reactions when the CMap files were added, and those are only ~1 MB in size.)	2018-06-13 11:02:06 +02:00
Jonas Jenwald	620f65488b	Ignore the rest of the image when encountering an EOI (End of Image) marker while parsing Scan data (issue 9679)	2018-05-30 22:40:11 +02:00
Jani Pehkonen	8ea505545a	Use FDSelect and FDArray when converting CFF CID font to paths	2018-04-10 16:44:42 +03:00
Jonas Jenwald	d431ae069d	Attempt to handle corrupt PDF documents that inline Page dictionaries in a Kids array (issue 9540) According to the specification, see https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G6.1942297, the contents of a Kids array should be indirect objects.	2018-03-12 14:13:23 +01:00
Jonas Jenwald	f05e5c5460	Take the dictionary, and not just the image data, into account when caching inline images (issue 9398) The reason for the bug is that we're only computing a checksum of the image data itself, but completely ignore the inline dictionary. The latter is important, since in practice it's not uncommon for inline images to be identical but use e.g. different ColourSpaces. There's obviously a couple of different ways that we could compute a hash/checksum of the dictionary. Initially I tried using `MurmurHash3_64` to compute a hash of the keys/values in the dictionary. Unfortunately this approach turned out to be way too slow in practice, especially for PDF files with a huge number of inline images; in particular issue 2618 would regresses quite badly with this solution. The solution that is instead implemented in this patch, is to compute a checksum of the dictionary contents. While this is a much simpler, not to mention a lot more efficient, solution there's one drawback associated with it: If the contents of inline image dictionaries are ordered differently, they will not be considered equal with this approach which could thus lead to failures to cache repeated inline images. In practice this doesn't seem to be a problem in any of the PDF files I've tested, and generally I'd rather err on the side of not caching given that too aggressive caching can easily lead to rendering bugs. One small, but somewhat annoying, complication is that by the time `Parser.makeInlineImage` is called, we no longer know the exact stream position where the inline image dictionary starts. Having access to that information is crucial here, and the easiest solution I could come up with is to track this in the current `Lexer` instance.[1] With the patch, we're thus able to fix the referenced issues without incurring large regressions in problematic cases such as issue 2618. Fixes 9398; also improves/fixes the `issue8823` reference test. --- [1] Obviously I'd have preferred if this patch could be limited to `Parser.makeInlineImage`, without the need for this "hack", but I'm not sure what that'd look like here.	2018-02-12 16:43:47 +01:00
Tim van der Meij	7bb066494f	Merge pull request #9427 from Snuffleupagus/native-JPEG-decoding-fallback Fallback to the built-in JPEG decoder when browser decoding fails, and attempt to handle JPEG images with DNL (Define Number of Lines) markers (issue 8614)	2018-02-09 21:36:08 +01:00
Jonas Jenwald	a18c65ae9f	Use the correct stream position when reading `maxSizeOfInstructions` from the `maxp` table (issue 9458) Please refer to the `maxp` table specification, found at https://developer.apple.com/fonts/TrueType-Reference-Manual/RM06/Chap6maxp.html. Fixes 9458.	2018-02-07 21:57:43 +01:00
Jonas Jenwald	bf4166e6c9	Attempt to handle DNL (Define Number of Lines) markers when parsing JPEG images (issue 8614) Please refer to the specification, found at https://www.w3.org/Graphics/JPEG/itu-t81.pdf#page=49 Given how the JPEG decoder is currently implemented, we need to know the value of the scanLines parameter (among others) before parsing of the SOS (Start of Scan) data begins. Hence the best solution I could come up with here, is to re-parse the image in the hopefully rare case of JPEG images that include a DNL (Define Number of Lines) marker. Fixes 8614.	2018-02-05 21:05:32 +01:00
Jonas Jenwald	f4a95de694	Attempt to find the next valid marker when encountering invalid image data in `JpegImage.parse` (issue 9425) In the JPEG images in the referenced PDF file, the DHT (Define Huffman Tables) segments contain more data than expected based on the length parameter. Fixes 9425.	2018-02-03 16:01:19 +01:00
Jani Pehkonen	5593c970e0	Implement Huffman coding in JBIG2	2018-01-23 17:04:07 +02:00
Jonas Jenwald	d0c8992e8a	Attempt to actually resolve ColourSpace names in accordance with the specification (issue 9285) Please refer to the PDF specification, in particular http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G7.3801570 > A colour space shall be specified in one of two ways: > - Within a content stream, the CS or cs operator establishes the current colour space parameter in the graphics state. The operand shall always be name object, which either identifies one of the colour spaces that need no additional parameters (DeviceGray, DeviceRGB, DeviceCMYK, or some cases of Pattern) or shall be used as a key in the ColorSpace subdictionary of the current resource dictionary (see 7.8.3, "Resource Dictionaries"). In the latter case, the value of the dictionary entry in turn shall be a colour space array or name. A colour space array shall never be inline within a content stream. > > - Outside a content stream, certain objects, such as image XObjects, shall specify a colour space as an explicit parameter, often associated with the key ColorSpace. In this case, the colour space array or name shall always be defined directly as a PDF object, not by an entry in the ColorSpace resource subdictionary. This convention also applies when colour spaces are defined in terms of other colour spaces.	2018-01-10 20:20:43 +01:00
Jonas Jenwald	d6c028b946	Add support for TrueType Collection fonts (issue 9262) The specification can be found at https://www.microsoft.com/typography/otspec/otff.htm, under the "Font Collections" heading. Fixes 9262.	2018-01-08 22:31:08 +01:00
Jonas Jenwald	c5700211d6	Adjust `decodeACSuccessive` in src/core/jpg.js to improve the rendering quality of (progressive) JPEG images I've been looking into the remaining point in 8637 about blurry images, to see if we could perhaps improve the rendering quality slightly there. After quite a bit of debugging, it seems that the issue is limited to certain progressive JPEG images. As mentioned previously, I've got no detailed knowledge of the JPEG format, but this patch does seem to improve things quite a bit for the images in question. Squinting at https://searchfox.org/mozilla-central/rev/6c33dde6ca02b389c52e8db3d22494df8b916f33/media/libjpeg/jdphuff.c#492-639, it seems reasonable that we should take the sign of the data into account. Furthermore, looking at the specification in https://www.w3.org/Graphics/JPEG/itu-t81.pdf#page=118, the "F.2.4.3 Decoding the binary decision sequence for non-zero DC differences and AC coefficients" section even contains a description of this (even though I cannot claim to really understand the details).	2017-12-30 15:24:09 +01:00
Jonas Jenwald	8c4b7d0439	Avoid truncating JPEG images with DeviceGray ColourSpaces when using the `src/core/jpg.js` built-in decoder The bug that this patch fixes is limited to the built-in JPEG decoder, and was unearthed by PR 9260. The underlying issue has existed since PR 6984, where the contents of this patch ought to have been included (if it weren't for the fact that we had no easy way to test `src/core/jpg.js` back then). Please note: The slight movement in the reference test is a result of using the `src/core/jpg.js` decoder, rather than the native browser one.	2017-12-29 18:44:07 +01:00
Jonas Jenwald	06605abbc2	Avoid rendering errors by passing in the `webGLContext` when creating a new `CanvasGraphics` in `getColorN_Pattern` (PR 9095 follow-up) This was an oversight in PR 9095, which unfortunately breaks rendering in some PDF files (e.g. the one from issue 6737). It thus appears that we don't have any test-coverage for this code-path, and given the relative complexity of the PDF files affected by this bug I wasn't able to easily create a reduced test-case. Please note: The linked test-case included in this patch is currently not rendered correctly (that'd be the PR 6606), but it at least gives us some test-coverage here.	2017-12-27 13:50:53 +01:00
Jonas Jenwald	d4cd44fd16	Add a fallback for non-embedded LucidaSans-Demi fonts (issue 9291) The PDF file in the issue uses a number of embedded versions of Lucida fonts, but for some reason does not embed the LucidaSans-Demi font. According to https://en.wikipedia.org/wiki/Lucida#Usages that one should be bold, so we can at least improve rendering here (even though it won't look perfect). Fixes 9291.	2017-12-24 17:36:58 +01:00
Jonas Jenwald	1dc54ddb40	Handle PDF files with missing 'endobj' operators, by searching for the "obj" string rather than "endobj" in `XRef.indexObjects` (issue 9105) This patch refactors the searching for 'endobj', to try and find the next occurance of "obj" and then check if it was in fact an 'endobj' and continue searching otherwise. This approach is used to avoid having to first find 'endobj', and then re-check the entire contents of the object and having to run (potentially expensive) regular expressions on arbitrary long strings. Fixes 9105.	2017-12-18 13:17:45 +01:00
Brendan Dahl	9b51cea724	Fix loca table when offsets aren't in ascending order.	2017-12-15 11:20:28 -06:00
Brendan Dahl	af1d80d45e	Merge pull request #9230 from Snuffleupagus/issue-9195 Add basic support for non-embedded Calibri fonts (issue 9195)	2017-12-08 10:15:43 -08:00
Jonas Jenwald	a5e3261b48	Merge pull request #9062 from mozilla/no_high Move char codes from high surrogate pair range into private use.	2017-12-08 12:31:22 +01:00
Brendan Dahl	306999c325	Move char codes from high surrogate pair range into private use. Fixes #2884	2017-12-07 10:35:50 -08:00
Jonas Jenwald	08de655177	Add basic support for non-embedded Calibri fonts (issue 9195) There's a number of issues with the fonts in the referenced PDF file. First of all, they contain broken `ToUnicode` data (`NUL` bytes all over the place). However even if you skip those, the `ToUnicode` data appears to contain nothing but a `IdentityH` CMap which won't help provide a proper glyph mapping. The real issue actually turns out to be that the PDF file uses the "Calibri" font[1], but doesn't include any font files. Since that one isn't a standard font, and uses a fairly different CID to GID map compared to the standard fonts, we're not able to render the file even remotely correct. To work around this, I'm thus proposing that we include a (incomplete) glyph map for Calibri, and fallback to the standard Helvetica font. Obviously this isn't going to look perfect, but it's really the best that we can hope to achieve given that the PDF file is missing the necessary font data. Finally, please note that none of the PDF readers I've tried (Adobe Reader, PDFium in Chrome) were able to extract the text (which isn't very surprising, given the broken `ToUnicode` data). Fixes 9195. --- [1] According to Wikipedia, see https://en.wikipedia.org/wiki/Calibri, Calibri is (primarily) a Windows font.	2017-12-03 17:23:33 +01:00
Jonas Jenwald	f3c50fe2f9	Merge pull request #9192 from Snuffleupagus/issue-8229 Build a fallback `ToUnicode` map for simple fonts (issue 8229)	2017-11-30 10:27:32 +01:00
Tim van der Meij	e320243870	Merge pull request #9206 from janpe2/svg-inv-images Fix inverted 1-bit images in SVG backend	2017-11-28 22:46:43 +01:00
Jani Pehkonen	58b214eab3	Fix inverted 1-bit images in SVG backend	2017-11-28 21:24:27 +02:00
Jani Pehkonen	06d083b04b	Fix pattern-filled text	2017-11-28 19:40:22 +02:00
Jonas Jenwald	61e19bee43	Build a fallback `ToUnicode` map for simple fonts (issue 8229) In some fonts, the included `ToUnicode` data is incomplete causing text-selection to not work properly. For simple fonts that contain encoding data, we can manually build a `ToUnicode` map to attempt to improve things. Please note that since we're currently using the `ToUnicode` data during glyph mapping, in an attempt to avoid rendering regressions, I purposely didn't want to amend to original `ToUnicode` data for this text-selection edge-case. Instead, I opted for the current solution, which will (hopefully) give slightly better text-extraction results in PDF file with incomplete `ToUnicode` data. According to the PDF specification, see [section 9.10.2](http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G8.1873172): > A conforming reader can use these methods, in the priority given, to map a character code to a Unicode value. > ... Reading that paragraph literally, it doesn't seem too unreasonable to use different methods for different charcodes. Fixes 8229.	2017-11-26 14:45:15 +01:00
Tim van der Meij	0fe80df2a7	Button widget annotations: implement support for pushbuttons	2017-11-26 14:09:48 +01:00
Jonas Jenwald	83e8398ff2	For non-embedded fonts, map softhyphen (0x00AD) to regular hyphen (0x002D) (issue 9084) In the PDF file, the `ToUnicode` data first maps the hyphen correctly, and then overwrites it to point to the softhyphen instead. That one cannot be rendered in browsers, and an empty space thus appear instead. Fixes 9084.	2017-10-31 13:26:04 +01:00
Jonas Jenwald	92fcfce685	Merge pull request #9082 from brendandahl/issue7562 Overwrite glyphs contour count if it's less than -1.	2017-10-30 20:44:01 +01:00
Brendan Dahl	17037b5e51	Overwrite glyphs contour count if it's less than -1. The test pdf has a contour count of -70, but OTS doesn't like values less than -1. Fixes issue #7562.	2017-10-30 09:16:51 -07:00
Jonas Jenwald	d71a576b30	Merge pull request #9045 from brendandahl/sani-name Sanitize name index in compile phase of CFF.	2017-10-24 11:48:03 +02:00
Brendan Dahl	6b12612a52	Sanitize name index in compile phase of CFF. Fixes #8960	2017-10-23 17:13:49 -07:00
Brendan Dahl	fcc9943d04	Use charstring as plain text when lengthIV is -1. Fixes #7769	2017-10-18 14:19:59 -07:00
Jonas Jenwald	b1472cddbb	Allow `getOperatorList`/`getTextContent` to skip errors when parsing broken XObjects (issue 8702, issue 8704) This patch makes use of the existing `ignoreErrors` property in `src/core/evaluator.js`, see PRs 8240 and 8441, thus allowing us to attempt to recovery as much as possible of a page even when it contains broken XObjects. Fixes 8702. Fixes 8704.	2017-09-29 17:14:21 +02:00
Brendan Dahl	18e2321845	Overwrite maxSizeOfInstructions in maxp with computed value. In issue #7507 the value is less than the actuall max size of the glyph instructions causing OTS to fail the font.	2017-09-25 17:53:26 -07:00
Jonas Jenwald	10727572a2	Merge pull request #8950 from timvandermeij/polygon-polyline-annotations Implement support for polyline and polygon annotations	2017-09-24 15:16:14 +02:00
Tim van der Meij	c69a7a83da	Merge pull request #8932 from janpe2/jbig2-sym-offset JBIG2 symbol offsets	2017-09-23 17:11:45 +02:00
Tim van der Meij	ed8c0ebfa7	Implement reference tests for polyline and polygon annotations	2017-09-23 17:01:19 +02:00
Jonas Jenwald	abc864fca9	Merge pull request #8938 from brendandahl/bug1392647 Use font's default width even when 0. (bug 1392647)	2017-09-20 22:38:39 +02:00
Brendan Dahl	10ba292b46	Use font's default width even when 0. Bug 1392647 has a PDF where the default width of the font is 0. It draws some charcodes that don't have glyphs, but we were wrongly using the 1000 default width for these charcodes causing some text to be overlapping.	2017-09-20 11:38:30 -07:00
Jani Pehkonen	5d1074c110	Fix JBIG2 symbol offsets in text regions	2017-09-19 23:43:23 +03:00
Jani Pehkonen	3d99b8d706	CCITTFaxStream problem when EndOfBlock is false	2017-09-19 22:19:40 +03:00
Tilman Hausherr	d75a497a6b	support tiff predictor for 16bit (for issue #6289) This does the same for 16 bit as the existing 8 bit tiff predictor code, an addition of the last word to this word. The last two "& 0xFF" may or may not be needed, I see this isn't done in the 8 bit code, but I'm not a JS developer.	2017-09-18 22:24:25 +02:00
Tim van der Meij	400e4aae0e	Implement support for stamp annotations	2017-09-16 16:37:50 +02:00
Jonas Jenwald	eece66fa3e	For /Filter entries containing `Name`s, ignore the /DecodeParms entry if it contains an Array (issue 8895)	2017-09-15 23:02:16 +02:00
Jonas Jenwald	f2618eb2e4	Merge pull request #8808 from janpe2/issue8741 Fix color of image masks inside uncolored patterns	2017-09-12 14:27:56 +02:00
Tim van der Meij	320779e6ed	Merge pull request #8691 from timvandermeij/square-circle-annotations Implement support for square and circle annotations	2017-09-09 22:56:54 +02:00
Tim van der Meij	c04f9d6098	Implement reference tests for square and circle annotations	2017-09-09 21:36:28 +02:00
Jonas Jenwald	7115e136e4	Hide unsupported `LinkAnnotation`s (issue 3897) Rather than displaying links that does nothing when clicked, it probably makes more sense to simply not render them instead. Especially since it turns out that, at least at this point in time, this is very easy to both implement and test. Fixes 3897.	2017-09-06 12:52:56 +02:00
Jani Pehkonen	86020396cb	Fix color of image masks inside uncolored patterns	2017-09-06 13:41:48 +03:00
Jonas Jenwald	49b8cd5a6a	Attempt to improve the `EI` detection heuristics, for inline images, in streams containing `NUL` bytes (issue 8823) Since this patch will now treat (some) `NUL` bytes as "ASCII", the number of `followingBytes` checked are thus increased to (hopefully) reduce the risk of introducing new false positives. Fixes 8823.	2017-08-27 12:48:28 +02:00
Tim van der Meij	798e46da97	Merge pull request #8821 from Snuffleupagus/issue-8798-reduced-test Replace the test-case for issue 8798 with a reduced one (PR 8800 follow-up)	2017-08-26 00:00:45 +02:00
Jonas Jenwald	4660cf8238	Prevent an infinite loop in `XRef.readXRef` by keeping track of already parsed tables (bug 1393476) With this patch, not only is the infinite loop prevented, but we're also able to actually render the file (which e.g. Adobe Reader isn't able to). Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1393476.	2017-08-24 19:18:08 +02:00
Jonas Jenwald	4891b9c7e0	Replace the test-case for issue 8798 with a reduced one (PR 8800 follow-up) Re: issue 8798 and PR 8800. Big thanks to @THausherr for providing the test-case.	2017-08-24 17:43:05 +02:00
Tim van der Meij	e9ba54940d	Merge pull request #8800 from Snuffleupagus/issue-8798 Try to recover if we reach the end of the stream when searching for the `EI` marker of an inline image (issue 8798)	2017-08-23 23:47:51 +02:00
Jonas Jenwald	cb55506b95	Try to recover if we reach the end of the stream when searching for the `EI` marker of an inline image (issue 8798)	2017-08-22 09:33:13 +02:00
Jani Pehkonen	9a581ee9ed	Implement JBIG2 halftone regions and pattern dictionaries	2017-08-08 15:38:29 +03:00
Brendan Dahl	0bef50d56d	Fix two cmap related issues. In issue #8707, there's a char code mapped to a non- existing glyph which shouldn't be drawn. However, we saw it was missing and tried to then use the post table and end up mapping it incorrectly. This illuminated a problem with issue #5704 and bug 893730 where glyphs disappeared after above fix. This was from the cmap returning the wrong glyph id. Which in turn was caused because the font had multiple of the same type of cmap table and we were choosing the last one. Now, we instead default to the first one. I'm unsure if we should instead be merging the multiple cmaps, but using only the first one works.	2017-08-03 22:19:36 -07:00
Jonas Jenwald	23ec6b16ca	Add a fallback for non-embedded SegoeUISymbol font (issue 8697) The PDF file uses a non-embedded SegoeUISymbol font, which is not a standard font (and is mainly used by Microsoft, see https://en.wikipedia.org/wiki/Segoe). Fixes 8697.	2017-07-25 12:45:11 +02:00
Jonas Jenwald	794b099385	Add a reduced test-case for issue 7696 Issue 7696 was one of the issues fixed by PR 8580. The other ones were all cases of missing glyphs, however in this particular one glyphs did render but every single one was incorrect. Hence it probably cannot hurt to have a small, reduced, reference test for that PDF file as well.	2017-07-24 09:55:16 +02:00
Jonas Jenwald	11e95712d4	Add support for the `nativeImageDecoderSupport` parameter, to force JPEG image decoding using `src/core/jpg.js`, when running the reference tests	2017-07-11 16:38:49 +02:00
Jonas Jenwald	ea71d23f74	Fix a stupid spelling error in the `ASCII85Decode` name in `Parser.makeInlineImage` (issue 8613) This is a trivial follow-up to PR 5383, and it's a bit strange that this has been wrong since late 2014 without anyone noticing (maybe because inline images aren't too common). So, apparently code works better if you actually spell correctly, who knew ;-) Fixes 8613.	2017-07-05 19:43:09 +02:00
Jonas Jenwald	eff257b820	Merge pull request #8580 from brendandahl/missing-glyf Fix how we detect and handle missing glyph data.	2017-07-04 12:16:07 +02:00
Brendan Dahl	efbbd8533f	Only mask char codes of (3, 0) cmap tables in the range of 0xF000 to 0xF0FF.	2017-07-03 13:13:46 -07:00
Brendan Dahl	6d4f748fb1	Fix how we detect and handle missing glyph data.	2017-07-03 13:06:06 -07:00
Brendan Dahl	a8a8909d2d	Fix missing notdef in expert encoding.	2017-06-29 12:12:39 -07:00
Brendan Dahl	f1f9d98519	Merge pull request #8507 from Snuffleupagus/issue-8480 Only special-case OpenType fonts with `CFF` data if it's both a composite (i.e. Type0) font and also has a non-default CID to GID map (issue 8480)	2017-06-23 13:36:58 -07:00
Rob Wu	fc6448d18c	Move svg:clipPath generation from clip to endPath In the PDF from issue 8527, the clip operator (W) shows up before a path is defined. The current SVG backend however expects a path to exist before generating a `<svg:clipPath>` element. In the example, the path was defined after the clip, followed by a endPath operator (n). So this commit fixes the bug by moving the path generation logic from clip to endPath. Our canvas backend appears to use similar logic: `CanvasGraphics_endPath` calls `consumePath`, which in turn draws the clip and resets the `pendingClip` state. The canvas backend calls `consumePath` from multiple other places, so we probably need to check whether doing so is also necessary for the SVG backend. I scanned our corpus of PDF files in test/pdfs, and found that in every instance (except for one), the "W" PDF operator (clip) is immediately followed by "n" (endPath). The new test from this commit (clippath.pdf) starts with "W", followed by a path definition and then "n". # Commands used to find some of the clipping commands: grep -ra '^W$' -C7 \| less -S grep -ra '^W ' -C7 \| less -S grep -ra ' W$' -C7 \| less -S test/pdfs/issue6413.pdf is the only file where "W" (a tline 55) is not followed by "n". In fact, the "W" is the last operation of a series of XObject painting operations, and removing it does not have any effect on the rendered PDF (confirmed by looking at the output of PDF.js's canvas backend, and ImageMagick's convert command).	2017-06-22 01:08:17 +02:00
Jonas Jenwald	e589834f13	Ensure that `TilingPattern`s have valid (non-zero) /BBox arrays (issue 8330) Fixes 8330.	2017-06-09 21:41:48 +02:00
Jonas Jenwald	8b4a42e5b8	Only special-case OpenType fonts with `CFF` data if it's both a composite (i.e. Type0) font and also has a non-default CID to GID map (issue 8480) As mentioned the last time that I touched this particular part of the font code, I'm sincerely hope that this doesn't cause any regressions! However, the patch passes all tests added in PRs 5770, 6270, and 7904 (and obviously all other tests as well). Furthermore, I've manually checked all the issues/bugs referenced in those PRs without finding any issues. Fixes 8480.	2017-06-09 21:15:39 +02:00
Jonas Jenwald	4ce5e520fb	Add different code-paths to `{CMap, ToUnicodeMap}.charCodeOf` depending on length, since `Array.prototype.indexOf` can be extremely inefficient for very large arrays (issue 8372) Fixes 8372.	2017-05-24 19:47:04 +02:00
Jonas Jenwald	ac942ac657	Merge pull request #8437 from yurydelendik/default-ctx Resets canvas 2d context to the default state.	2017-05-23 23:31:57 +02:00
Yury Delendik	a67198895f	Resets canvas 2d context to the default state.	2017-05-23 15:10:30 -05:00
Jonas Jenwald	31c24ed631	Don't map glyphs to the HANGUL FILLER (0x3164) Unicode location (issue 8424) This patch follows a similar pattern as previous ones, by skipping certain problematic Unicode locations. According to http://searchfox.org/mozilla-central/rev/6c2dbacbba1d58b8679cee700fd0a54189e0cf1b/gfx/harfbuzz/src/hb-unicode-private.hh#136, it seems that the HANGUL FILLER (0x3164) location is "special". Fixes 8424.	2017-05-23 16:12:45 +02:00
Jonas Jenwald	0c2ebda31c	Cache JPEG images, just as we do for other image formats, in `evaluator.js` (issue 8380) For some reason, we're putting all kind of images except JPEG into the `imageCache` in `evaluator.js`.[1] This means that in the PDF file in issue 8380, we'll keep sending the same two small images[2] to the main-thread and decoding them over and over. This is obviously hugely inefficient! As can be seen from the discussion in the issue, the performance becomes extremely bad if the user has the addon "Adblock Plus" installed. However, even in a clean Firefox profile, the performance isn't that great. This patch not only addresses the performance implications of the "Adblock Plus" addon together with that particular PDF file, but it also improves the rendering times considerably for all users. Locally, with a clean profile, the rendering times are reduced from `~2000 ms` to `~500 ms` for my setup! Obviously, the general structure of the PDF file and its operator sequence is still hugely inefficient, however I'd say that the performance with this patch is good enough to consider the issue (as it stands) resolved.[3] Fixes 8380. --- [1] Not technically true, since inline images are cached from `parser.js`, but whatever :-) [2] The two JPEG images have dimensions 1x2, respectively 4x2. [3] To make this even more efficient, a new state would have to be added to the `QueueOptimizer`. Given that PDF files this stupid fortunately aren't too common, I'm not convinced that it's worth doing.	2017-05-07 13:07:41 +02:00
Jonas Jenwald	40feca12c1	Ignore line-breaks between operator and digit in `Lexer.getNumber` This is consistent with the behaviour in Adobe Reader (and PDFium), and it fixes the display of page 30 in https://bug1354114.bmoattachments.org/attachment.cgi?id=8855457 (taken from https://bugzilla.mozilla.org/show_bug.cgi?id=1354114). The patch also makes the `error` message for invalid numbers slightly more useful, by including the charCode as well. (Having that information available would have reduced the time spent on debugging the PDF file above.)	2017-05-02 20:59:42 +02:00
Jani Pehkonen	64deb6c700	Subtract the X/Y offsets when decoding refinement regions of JBIG2 images (issue 7145, 7308, 7401, 7850, 8270) Please refer to the JBIG2 standard, see https://www.itu.int/rec/dologin_pub.asp?lang=e&id=T-REC-T.88-200002-I!!PDF-E&type=items. In particular, section "6.3.5.3 Fixed templates and adaptive templates" mentions that the offsets should be subtracted; where the offsets are defined according to "Table 6" under section "6.3.2 Input parameters". Fixes 7145. Fixes 7308. Fixes 7401. Fixes 7850. Fixes 8270.	2017-04-26 16:06:15 +02:00
Yury Delendik	c4c44c1bbe	Merge pull request #8240 from Snuffleupagus/api-stopAtErrors [api-minor] Always allow e.g. rendering to continue even if there are errors, and add a `stopAtErrors` parameter to `getDocument` to opt-out of this behaviour (issue 6342, issue 3795, bug 1130815)	2017-04-13 10:58:49 -05:00
Tim van der Meij	32e01cda96	Merge pull request #8228 from timvandermeij/line-annotations Implement support for line annotations	2017-04-13 00:18:31 +02:00
Tim van der Meij	e15a2ec523	Annotations: implement support for line annotations This patch implements support for line annotations. Other viewers only show the popup annotation when hovering over the line, which may have any orientation. To make this possible, we render an invisible line (SVG element) over the line on the canvas that acts as the trigger for the popup annotation. This invisible line has the same starting coordinates, ending coordinates and width of the line on the canvas.	2017-04-12 23:05:25 +02:00
Jonas Jenwald	a39d636eb8	[api-minor] Always allow e.g. rendering to continue even if there are errors, and add a `stopAtErrors` parameter to `getDocument` to opt-out of this behaviour (issue 6342, issue 3795, bug 1130815) Other PDF readers, e.g. Adobe Reader and PDFium (in Chrome), will attempt to render as much of a page as possible even if there are errors present. Currently we just bail as soon the first error is hit, which means that we'll usually not render anything in these cases and just display a blank page instead. NOTE: This patch changes the default behaviour of the PDF.js API to always attempt to recover as much data as possible, even when encountering errors during e.g. `getOperatorList`/`getTextContent`, which thus improve our handling of corrupt PDF files and allow the default viewer to handle errors slightly more gracefully. In the event that an API consumer wishes to use the old behaviour, where we stop parsing as soon as an error is encountered, the `stopAtErrors` parameter can be set at `getDocument`. Fixes, inasmuch it's possible since the PDF files are corrupt, e.g. issue 6342, issue 3795, and [bug 1130815](https://bugzilla.mozilla.org/show_bug.cgi?id=1130815) (and probably others too).	2017-04-11 08:59:22 +02:00
Brendan Dahl	4969b2ad97	Normalize blend mode names.	2017-04-10 16:18:08 -07:00
Brendan Dahl	cdc79a4721	Don’t skip glyph 0 in cmap.	2017-04-05 15:17:38 -07:00
Jonas Jenwald	62eee8c782	Try harder to find the next valid JPEG marker when decoding Scan data (issue 8182, issue 8189) Tentatively fixes 8182 and fixes 8189.	2017-03-27 15:55:21 +02:00
Jonas Jenwald	3705e5e459	Use a proper `MessageHandler` for `PartialEvaluator.getTextContent` to avoid errors for fonts relying on built-in CMap files (PR 8064 follow-up) My apologies for inadvertently breaking this in PR 8064; apparently we don't have any tests that cover this use-case :( Without this patch `getTextContent` will fail if called before `getOperatorList`, since loading of fonts during text-extraction may require fetching of built-in CMap files. Please note: The `text` test added here, which uses an already existing PDF file, fails without this patch.	2017-03-24 17:39:33 +01:00
Jonas Jenwald	e2e13df4a5	Merge pull request #8164 from Snuffleupagus/issue-7828 Don't read past the EOI marker for JPEG images with non-default restart interval (issue 7828)	2017-03-20 22:17:28 +01:00
Jonas Jenwald	d6d0f778aa	Don't read past the EOI marker for JPEG images with non-default restart interval (issue 7828) After browsing through (a version of) the JPEG specification, see https://www.w3.org/Graphics/JPEG/itu-t81.pdf, I hope that this patch makes sense. Note that while issue 7828 became a problem after PR 7661, it isn't really a regression from than PR. The explanation is rather that we're now relying on `core/jpg.js` instead of the Native Image decoder in more situations than before, which thus exposed an existing issue in our JPEG decoder. Another factor also seems to be that in many JPEG images, the DRI (Define Restart Interval) marker isn't present, in which case this bug won't manifest either. According to https://www.w3.org/Graphics/JPEG/itu-t81.pdf#page=89 (at the bottom of the page): "NOTE – The final restart interval may be smaller than the size specified by the DRI marker segment, as it includes only the number of MCUs remaining in the scan." Furthermore, according to https://www.w3.org/Graphics/JPEG/itu-t81.pdf#page=39 (in the middle of the page): "[...] If restart is enabled and the restart interval is defined to be Ri, each entropy-coded segment except the last one shall contain Ri MCUs. The last one shall contain whatever number of MCUs completes the scan." Based on the above, it thus seem to me that we should simply ensure that we're not attempting to continue to parse Scan data once we've found all MCUs (Minimum Coded Unit) of the image. Fixes 7828.	2017-03-20 17:16:33 +01:00
Jonas Jenwald	be1a6f294f	Try to recover when encountering JPEG markers with too short marker lengths (issue 8169) The issue with the JPEG image in question, is that the COM (Comment) marker has an incorrect length entry. Fixes 8169.	2017-03-20 17:05:51 +01:00
Jonas Jenwald	098a56270d	Normalize the `BBox` entry in Tiling Pattern dictionaries (issue 8117) According to the PDF specification, see http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G7.3982967, the `BBox` entry should have the form `[left, bottom, right, top]`. Since some PDF generators apparently violates the specification, we normalize the `BBox` to ensure that the pattern is (correctly) rendered. Fixes 8117.	2017-03-16 21:43:11 +01:00
Jason O. Jensen	d230784ac3	Handle cff fonts with erroneous stackSize	2017-03-06 19:28:46 -05:00
Jonas Jenwald	4a0ff5dbf7	Ensure that we don't ignore `0` values in `Page.getInheritedPageProp` (issue 8125) It appears that I accidentally broke this in PR 6065, sorry about that! The issue in this particular PDF file is that there's `/Rotate` entries on different levels of the `/Pages` tree. We're supposed to use the `/Rotate` entry in the `/Page` dict (which is `0`), but because of an incorrect condition we instead ended up with the one from the `/Pages` dict (which is `180`). Fixes 8125.	2017-03-03 12:27:40 +01:00
Jonas Jenwald	1ce295541c	Always check all Kids nodes, in `Catalog.getPageDict`, to avoid getting stuck in an empty node further down in the Pages tree (issue 8088) As discussed on IRC, we need to check all nodes at the bottom of the tree to ensure that we find the correct `Page` dict. Furthermore, this patch also gets rid of the caching present in a previous version, since it's not clear if that really helps. Note that this patch purposely adds an `eq` test, using a reduced test-case, so that we can be sure that the algorithm actually finds the correct `Page` dict for each `pageIndex`. Fixes 8088.	2017-02-24 12:09:46 +01:00
Jonas Jenwald	ce072022c1	Always choose a (3, 1) cmap table for TrueType fonts that have an encoding specified, regardless of the Symbolic font flag (bug 1337429) This patch basically reverts one aspect of TrueType (3, 1) cmap parsing to the state prior to PR 4259. After that PR, a number of regressions occurred in this particular code-path, which necessitated a number of follow-ups such as PRs 5703, 5743, and 6425. The empirical data suggests, at least to me, that we should always prefer a (3, 1) cmap for TrueType fonts when they have an encoding, regardless of the Symbolic font flag. Obviously this patch passes all unit/font/reference tests locally, and I made sure that all the PRs mentioned above landed with test-cases included. However, in my opinion, there's still a very real possibility that this patch could potentially cause new regressions. Given that the PDF file in bug 1337429 has been broken for almost three years before anyone noticed, and considering that the code-path in question has been the source of numerous regressions, I do not intend to request uplift of this patch to previous Firefox versions (assuming that it's even accepted). Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1337429.	2017-02-15 17:38:08 +01:00
Jonas Jenwald	23c62cc321	Consume the current character when encountering illegal characters in `Lexer.getObject`, in order to prevent infinite loops during reading of streams (issue 8061) Please note: The rendering of the PDF file in issue 8061 first regressed in PR 7039, and then PR 7493 exacerbated the problem even further by causing an infinite loop. In this particular case, when errors were encountered inside of the `Lexer.getObject` method itself, we didn't advance the stream position. This thus caused an inifinite loop in `parseCMap`, since the exact same character was then parsed over and over again. Fixes 8061.	2017-02-11 19:32:48 +01:00
pmysore1	af8292058f	Font ascent descent calculation fix	2017-02-11 01:25:05 -05:00
Tim van der Meij	1fda987a4c	Merge pull request #7904 from Snuffleupagus/issue-7901 Further adjust the heuristics used to detect OpenType font files with CFF data, to ensure that all Type0 fonts are handled the same way regardless of font Subtype (issue 7901)	2017-01-12 21:55:57 +01:00
Yury Delendik	393740e2ae	Merge pull request #7869 from PedroPachecoInf/master Fixes issue #6071 - TIFF with 1 bit-depth	2017-01-10 12:37:26 -06:00
jazzchipc	493853031b	Fixes issue #6071 . Corrects readBlockTiff() case for 1-bit depth and 1 color TIFF images incorporated in the PDF. Adds reference test for PDF used to fix this issue.	2017-01-10 16:42:43 +00:00
Jonas Jenwald	e963971244	Further adjust the heuristics used to detect OpenType font files with CFF data, to ensure that all Type0 fonts are handled the same way regardless of font Subtype (issue 7901) Changing this particular code makes me somewhat nervous about regressions, since PR 5770 necessitated the follow-up PR 6270. However, the patch passes all tests added in those PRs (and obviously all other tests). Furthermore, I've manually checked all the issues/bugs referenced in PRs 5770 and 6270 without finding any issues. Please note: This patch fixes only the font bug, not the SVG conversion, present on pages two and three of the PDF file in issue 7901.	2016-12-20 17:03:51 +01:00
Yury Delendik	3b3a179486	Merge pull request #7879 from rossj/highlight-fix Make use of textAdvanceScale consistent during combineTextItems. Fix for #7878.	2016-12-19 09:18:13 -06:00
Tim van der Meij	0c9a06c020	Button widget annotations: implement reference testing Moreover, ensure that the read-only state is respected and improve CSS names.	2016-12-17 20:33:35 +01:00
Ross Johnson	4537590033	Consitently apply textAdvanceScale during building of textContentItems for improved highlighting. Fixes #7878 .	2016-12-14 21:02:19 -06:00
Jonas Jenwald	9be3aee9c9	Add a parameter to `Page_getInheritedPageProp` to make it possible to fetch (and dereference) Arrays, and use that for the `MediaBox`/`CropBox` getters (issue 7872)	2016-12-08 22:03:42 +01:00
Jonas Jenwald	c5b06cb40d	Ensure that `PartialEvaluator_extractWidths` is able to handle indirect objects in all kinds of "width" data (issue 7855) Fixes 7855.	2016-11-29 20:49:07 +01:00
Jonas Jenwald	451956c0b1	Merge pull request #7628 from Snuffleupagus/issue-7580 Fallback to the `StandardEncoding` for Nonsymbolic fonts without `/Encoding` entry (issue 7580)	2016-11-29 12:37:36 +01:00
Jonas Jenwald	3170a4c40a	Improve rendering of non-embedded NuptialScript font This patch fixes something that I noticed while debugging https://bugzilla.mozilla.org/show_bug.cgi?id=1308536. The PDF file contains a font called "NuptialScript", which unfortunately is not embedded. Since that is a non-standard font we will not be able to render it entirely correct. However, by adding "NuptialScript" to the `getNonStdFontMap`, we can at least improve the rendering slightly by using an italic (serif) fallback font.	2016-11-22 17:56:17 +01:00
Jonas Jenwald	d3043167de	Correctly detect more cases of non-embedded Arial Black fonts (issue 7835) This patch adds support for non-embedded Arial Black fonts, that use a `Arial-Black...` format for the font names. Also, this patch changes `canvas.js` such that we always render Arial Black fonts with the maximum weight, which actually improves a number of existing test-cases. This should thus explain the test "failures", which are clear improvements compared with e.g. Adobe Reader. Fixes 7835.	2016-11-22 13:56:21 +01:00
Jonas Jenwald	9dc6463933	Ignore reserved commands when parsing operands in `CFFParser_parseDict`, instead of just rejecting the entire font (bug 1308536) According to the CFF specification, see http://partners.adobe.com/public/developer/en/font/5176.CFF.pdf#page=11, certain commands are currently reserved. Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1308536.	2016-11-03 12:50:40 +01:00
Yury Delendik	ea5949f1fd	Merge pull request #7668 from Snuffleupagus/issue-7665 Prevent an infinite loop in `XRef_fetchUncompressed` for encrypted PDF files with indirect objects in the /Encrypt dictionary (issue 7665)	2016-10-15 10:52:08 -05:00
Chas Emerick	85c52f1fd6	Fix getTextContent evaluation to only apply TJ horizontal offsets using numeric items/args While the array argument to TJ should only contain strings and numbers, other unfortunate items are found in PDFs in the wild, e.g.: [(Grandes) 0.0 Tc -250.0 (Client\350les,) 0.0 Tc -250.0 (Financements) 0.0 Tc -250.0 (et) 0.0 Tc -250.0 (March\351s) ] TJ getOperatorList already properly ignores any non-string, non-numeric values in TJ arrays; without this patch to getTextContent, returned text items can have NaN widths due to calculations being applied to those non-numeric values.	2016-10-13 08:08:31 -04:00
Tim van der Meij	9b3a91f365	Merge pull request #7671 from timvandermeij/interactive-forms-choice-fields Interactive forms: render choice widget annotations	2016-10-05 23:27:45 +02:00
Tim van der Meij	f85f3243b1	Choice widget annotations: unit and reference testing	2016-10-05 21:25:29 +02:00
Yury Delendik	7b2a9ee4e0	Merge pull request #7670 from Snuffleupagus/Parser_makeFilter-maybeLength Only skip parsing a stream in `Parser_makeFilter` when we know for sure that it is empty (PR 6372 follow-up)	2016-10-05 10:38:12 -05:00
Jonas Jenwald	54ee83eb12	Attempt to skip zero bytes at the end of Scan blocks when decoding JPEG images (issue 4090)	2016-09-28 16:31:02 +02:00
Jonas Jenwald	116ba19dd9	Respect the 'ColorTransform' entry in the image dictionary when decoding JPEG images (bug 956965, issue 6574) Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=956965. Fixes 6574.	2016-09-26 21:55:43 +02:00
Jonas Jenwald	a22f0ae820	Only skip parsing a stream in `Parser_makeFilter` when we know for sure that it is empty (PR 6372 follow-up) For PDF files with multiple `/Filter`s, where the `/Length` entry is zero, we fail to render the file correctly. The reason is that `maybeLength` is `null` for the every filter except the first, and `!maybeLength` is thus truthy. Hence it seems that we should completely ignore the `/Length` entry and also explicitly check `maybeLength === 0`. Note that I've not (yet) come across a PDF file with this issue in the wild, but given all the stupid things PDF generators do I wouldn't be surprised if such a file actually exists. In order to prevent a possible future bug, I'm submitting this patch which includes a hand-edited PDF file that we currently cannot render correctly (but e.g. Adobe Reader can).	2016-09-25 12:40:15 +02:00
Jonas Jenwald	4d2de9b47e	Add a reduced `load` test for issue 7665	2016-09-25 00:19:42 +02:00
Jonas Jenwald	6c263c1994	Merge pull request #7649 from timvandermeij/interactive-forms-tx-comb Text widget annotations: implement comb support	2016-09-22 11:36:30 +02:00
Tim van der Meij	6100ab4b18	Text widget annotations: implement comb support	2016-09-20 22:31:10 +02:00
Brendan Dahl	15e1ae4e3f	Merge pull request #7639 from Snuffleupagus/bug-1252420 Replace empty CharStrings with '.notdef' in `Type1Font_wrap` to prevent OTS from rejecting the font (bug 1252420)	2016-09-20 11:56:47 -07:00
Jonas Jenwald	170871ab3d	Prevent rendering `TextWidgetAnnotation`s in both the `core`/`display` layer (issue 7643)	2016-09-18 15:42:22 +02:00
Tim van der Meij	f062695d62	Merge pull request #7633 from timvandermeij/interactive-forms-tx-flags Text widget annotations: support read-only/multiline fields and improve testing	2016-09-17 17:19:47 +02:00
Tim van der Meij	adf0972ca5	Text widget annotations: improve unit and reference tests This patch improves the unit tests by testing the support for read-only and multiline fields. Moreover, we add a reference test to ensure that the text widgets are not only rendered, but also that their contents are styled properly. Finally, we perform minor improvements in `src/core/annotation.js`, for example adding missing comments.	2016-09-17 15:24:48 +02:00

1 2 3 4 5 ...

910 Commits