pdf.js

Author	SHA1	Message	Date
Jonas Jenwald	08de655177	Add basic support for non-embedded Calibri fonts (issue 9195) There's a number of issues with the fonts in the referenced PDF file. First of all, they contain broken `ToUnicode` data (`NUL` bytes all over the place). However even if you skip those, the `ToUnicode` data appears to contain nothing but a `IdentityH` CMap which won't help provide a proper glyph mapping. The real issue actually turns out to be that the PDF file uses the "Calibri" font[1], but doesn't include any font files. Since that one isn't a standard font, and uses a fairly different CID to GID map compared to the standard fonts, we're not able to render the file even remotely correct. To work around this, I'm thus proposing that we include a (incomplete) glyph map for Calibri, and fallback to the standard Helvetica font. Obviously this isn't going to look perfect, but it's really the best that we can hope to achieve given that the PDF file is missing the necessary font data. Finally, please note that none of the PDF readers I've tried (Adobe Reader, PDFium in Chrome) were able to extract the text (which isn't very surprising, given the broken `ToUnicode` data). Fixes 9195. --- [1] According to Wikipedia, see https://en.wikipedia.org/wiki/Calibri, Calibri is (primarily) a Windows font.	2017-12-03 17:23:33 +01:00
Tim van der Meij	70a28ab34f	Implement unit tests for the utility functions `bytesToString` and `stringToBytes`	2017-12-03 12:52:16 +01:00
Jonas Jenwald	f3c50fe2f9	Merge pull request #9192 from Snuffleupagus/issue-8229 Build a fallback `ToUnicode` map for simple fonts (issue 8229)	2017-11-30 10:27:32 +01:00
Tim van der Meij	e320243870	Merge pull request #9206 from janpe2/svg-inv-images Fix inverted 1-bit images in SVG backend	2017-11-28 22:46:43 +01:00
Jani Pehkonen	58b214eab3	Fix inverted 1-bit images in SVG backend	2017-11-28 21:24:27 +02:00
Jani Pehkonen	06d083b04b	Fix pattern-filled text	2017-11-28 19:40:22 +02:00
Jonas Jenwald	61e19bee43	Build a fallback `ToUnicode` map for simple fonts (issue 8229) In some fonts, the included `ToUnicode` data is incomplete causing text-selection to not work properly. For simple fonts that contain encoding data, we can manually build a `ToUnicode` map to attempt to improve things. Please note that since we're currently using the `ToUnicode` data during glyph mapping, in an attempt to avoid rendering regressions, I purposely didn't want to amend to original `ToUnicode` data for this text-selection edge-case. Instead, I opted for the current solution, which will (hopefully) give slightly better text-extraction results in PDF file with incomplete `ToUnicode` data. According to the PDF specification, see [section 9.10.2](http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G8.1873172): > A conforming reader can use these methods, in the priority given, to map a character code to a Unicode value. > ... Reading that paragraph literally, it doesn't seem too unreasonable to use different methods for different charcodes. Fixes 8229.	2017-11-26 14:45:15 +01:00
Tim van der Meij	0fe80df2a7	Button widget annotations: implement support for pushbuttons	2017-11-26 14:09:48 +01:00
Tim van der Meij	25b07812b9	Sanitize the display value for choice widget annotations	2017-11-18 20:37:27 +01:00
Tim van der Meij	edaf4b3173	Merge pull request #9037 from Snuffleupagus/refactor-streams-params Re-factor how parameters are passed to the network streams	2017-11-18 15:41:15 +01:00
Tim van der Meij	9686f6652c	Merge pull request #9089 from yurydelendik/rm-chunks Extracts OperatorList class and prepares for streaming	2017-11-13 23:35:40 +01:00
Jonas Jenwald	23699cef1c	Re-factor how parameters are passed to the network streams This patch is the result of me starting to look into moving parameters from `PDFJS` into `getDocument` and other API methods. When familiarizing myself with the code, the signatures of the various network streams seemed to be unnecessarily cumbersome since `disableRange` is currently handled separately from other parameters. I'm assuming that the explanation for this is probably "for historical reasons", as is often the case. Hence I'd like to clean this up before we start the larger, and more invasive, `PDFJS` parameter re-factoring.	2017-11-11 11:23:29 +01:00
Brendan Dahl	b46443f0c1	Merge pull request #9077 from yurydelendik/v2 Version 2.0 merge	2017-10-31 14:24:20 -07:00
Jonas Jenwald	83e8398ff2	For non-embedded fonts, map softhyphen (0x00AD) to regular hyphen (0x002D) (issue 9084) In the PDF file, the `ToUnicode` data first maps the hyphen correctly, and then overwrites it to point to the softhyphen instead. That one cannot be rendered in browsers, and an empty space thus appear instead. Fixes 9084.	2017-10-31 13:26:04 +01:00
Jonas Jenwald	92fcfce685	Merge pull request #9082 from brendandahl/issue7562 Overwrite glyphs contour count if it's less than -1.	2017-10-30 20:44:01 +01:00
Yury Delendik	85f544f55a	Moves OperatorList and QueueOptimizer into separate file.	2017-10-30 13:29:58 -05:00
Brendan Dahl	17037b5e51	Overwrite glyphs contour count if it's less than -1. The test pdf has a contour count of -70, but OTS doesn't like values less than -1. Fixes issue #7562.	2017-10-30 09:16:51 -07:00
Yury Delendik	b4e25fb2e8	Merge remote-tracking branch 'mozilla/version-2.0' into v2	2017-10-27 14:01:45 -05:00
Jonas Jenwald	d71a576b30	Merge pull request #9045 from brendandahl/sani-name Sanitize name index in compile phase of CFF.	2017-10-24 11:48:03 +02:00
Brendan Dahl	6b12612a52	Sanitize name index in compile phase of CFF. Fixes #8960	2017-10-23 17:13:49 -07:00
Brendan Dahl	fcc9943d04	Use charstring as plain text when lengthIV is -1. Fixes #7769	2017-10-18 14:19:59 -07:00
Tim van der Meij	17cc94db4e	Merge pull request #9034 from Snuffleupagus/javascript-null [api-major] Change `getJavaScript` to return `null`, rather than an empty Array, when no JavaScript exists	2017-10-17 21:58:45 +02:00
Tim van der Meij	7d7edd9cc6	[api-major] Remove the `PDFJS_NEXT` option Nothing uses this option anymore, so setting it is a no-op now. We can safely remove it. Use `SKIP_BABEL` (instead of `PDFJS_NEXT`) now if you want to skip Babel translation for a build.	2017-10-16 23:16:51 +02:00
Jonas Jenwald	1cd1582cb9	[api-major] Change `getJavaScript` to return `null`, rather than an empty Array, when no JavaScript exists Other API methods already return `null`, rather than empty Arrays/Objects, hence it makes sense to change `getJavaScript` to be consistent.	2017-10-15 22:17:14 +02:00
Jonas Jenwald	33b1d1b20a	Fix a `PDFHistory` regression with document hashes of the `nameddest=...` form Unfortunately I've just found out that this isn't working entirely correct; my apologies for accidentally breaking this in PR 8775. Compare e.g. this link: http://mirrors.ctan.org/info/lshort/english/lshort.pdf#page.157, with this one: http://mirrors.ctan.org/info/lshort/english/lshort.pdf#nameddest=page.157. Notice how in the second case, the history stops working correctly. The various edge-case regressions in the new `PDFHistory` code is reminding my why I put off the rewrite for so long :-(	2017-10-09 21:58:54 +02:00
Tim van der Meij	509d3728f1	Merge pull request #8922 from Snuffleupagus/paintXObject-errors Allow `getOperatorList`/`getTextContent` to skip errors when parsing broken XObjects (issue 8702, issue 8704)	2017-10-07 15:46:26 +02:00
Tim van der Meij	f73c9b75d9	Transform Web Archive URLs to avoid downloading an HTML page instead of the PDF file Moreover, adjust one linked test case that did not conform to the standard Web Archive URL format and adjust one linked test case because the link was dead.	2017-09-30 19:50:31 +02:00
Jonas Jenwald	b1472cddbb	Allow `getOperatorList`/`getTextContent` to skip errors when parsing broken XObjects (issue 8702, issue 8704) This patch makes use of the existing `ignoreErrors` property in `src/core/evaluator.js`, see PRs 8240 and 8441, thus allowing us to attempt to recovery as much as possible of a page even when it contains broken XObjects. Fixes 8702. Fixes 8704.	2017-09-29 17:14:21 +02:00
Jonas Jenwald	b8ec518a1e	Split the existing `PDFFunction` in two classes, a private `PDFFunction` and a public `PDFFunctionFactory, and utilize the latter in` PDFDocument `to allow various code to access the methods of` PDFFunction` Follow-up to PR 8909. This requires us to pass around `pdfFunctionFactory` to quite a lot of existing code, however I don't see another way of handling this while still guaranteeing that we can access `PDFFunction` as freely as in the old code. Please note that the patch passes all tests locally (unit, font, reference), and I very much hope that we have sufficient test-coverage for the code in question to catch any typos/mistakes in the re-factoring.	2017-09-29 15:30:53 +02:00
Jonas Jenwald	a159c4f357	Check that `this.baseUrl` is defined before attempting to fetch any data in `DOMCMapReaderFactory`/`NodeCMapReaderFactory`	2017-09-28 12:34:57 +02:00
Brendan Dahl	18e2321845	Overwrite maxSizeOfInstructions in maxp with computed value. In issue #7507 the value is less than the actuall max size of the glyph instructions causing OTS to fail the font.	2017-09-25 17:53:26 -07:00
Jonas Jenwald	10727572a2	Merge pull request #8950 from timvandermeij/polygon-polyline-annotations Implement support for polyline and polygon annotations	2017-09-24 15:16:14 +02:00
Tim van der Meij	c69a7a83da	Merge pull request #8932 from janpe2/jbig2-sym-offset JBIG2 symbol offsets	2017-09-23 17:11:45 +02:00
Tim van der Meij	ed8c0ebfa7	Implement reference tests for polyline and polygon annotations	2017-09-23 17:01:19 +02:00
Tim van der Meij	d7b37ae745	Merge pull request #8912 from timvandermeij/xml-parser [api-minor] Replace `DOMParser` with `SimpleXMLParser`	2017-09-20 23:45:00 +02:00
Jonas Jenwald	abc864fca9	Merge pull request #8938 from brendandahl/bug1392647 Use font's default width even when 0. (bug 1392647)	2017-09-20 22:38:39 +02:00
Brendan Dahl	10ba292b46	Use font's default width even when 0. Bug 1392647 has a PDF where the default width of the font is 0. It draws some charcodes that don't have glyphs, but we were wrongly using the 1000 default width for these charcodes causing some text to be overlapping.	2017-09-20 11:38:30 -07:00
Tim van der Meij	2281061882	Enable metadata unit tests for Travis CI and Node.js	2017-09-19 23:09:07 +02:00
Tim van der Meij	d4309614f9	Replace `DOMParser` with `SimpleXMLParser` The `DOMParser` is most likely overkill and may be less secure. Moreover, it is not supported in Node.js environments. This patch replaces the `DOMParser` with a simple XML parser. This should be faster and gives us Node.js support for free. The simple XML parser is a port of the one that existed in the examples folder with a small regex fix to make the parsing work correctly. The unit tests are extended for increased test coverage of the metadata code. The new method `getAll` is provided so the example does not have to access internal properties of the object anymore.	2017-09-19 23:09:07 +02:00
Jani Pehkonen	5d1074c110	Fix JBIG2 symbol offsets in text regions	2017-09-19 23:43:23 +03:00
Jani Pehkonen	3d99b8d706	CCITTFaxStream problem when EndOfBlock is false	2017-09-19 22:19:40 +03:00
Tilman Hausherr	d75a497a6b	support tiff predictor for 16bit (for issue #6289) This does the same for 16 bit as the existing 8 bit tiff predictor code, an addition of the last word to this word. The last two "& 0xFF" may or may not be needed, I see this isn't done in the 8 bit code, but I'm not a JS developer.	2017-09-18 22:24:25 +02:00
Tim van der Meij	400e4aae0e	Implement support for stamp annotations	2017-09-16 16:37:50 +02:00
Jonas Jenwald	eece66fa3e	For /Filter entries containing `Name`s, ignore the /DecodeParms entry if it contains an Array (issue 8895)	2017-09-15 23:02:16 +02:00
Jonas Jenwald	1ebbdc253a	Use the `SimpleLinkService` when running "annotations" reference tests Rather than (basically) duplicating the `SimpleLinkService` in `test/driver.js`, with potential test failuires if you forget to update the test mock, it seems much nicer to just re-use the viewer component. Note that `SimpleLinkService` is already bundled into the `build/components/pdf_viewer.js` file. Hence we only need to expose it similar to the other viewer components in that file, and make sure that the `gulp components` command runs as part of the test-setup.	2017-09-12 15:24:46 +02:00
Jonas Jenwald	f2618eb2e4	Merge pull request #8808 from janpe2/issue8741 Fix color of image masks inside uncolored patterns	2017-09-12 14:27:56 +02:00
Tim van der Meij	23cbe294d5	Combine the common styles and overrides for the annotation layer reference tests This patch allows us to use the common styles as used by the viewer as a baseline for the annotation layer reference tests. They are extended with a small set of overrides to ensure that all elements are visible during the test. The overrides file now only contains the absolutely necessary rules to make all elements visible and is therefore no longer an almost verbatim copy of the common styles.	2017-09-10 18:18:56 +02:00
Tim van der Meij	320779e6ed	Merge pull request #8691 from timvandermeij/square-circle-annotations Implement support for square and circle annotations	2017-09-09 22:56:54 +02:00
Tim van der Meij	c04f9d6098	Implement reference tests for square and circle annotations	2017-09-09 21:36:28 +02:00
Tim van der Meij	f7fd1db52f	Introduce `DOMSVGFactory` This patch provides a new unit tested factory for creating SVG containers and elements. This code is duplicated twice in the codebase, but with upcoming changes this would need to be duplicated even more. Moreover, consolidating this code in one factory allows us to replace it easily for e.g., supporting Node.js. Therefore, move this to a central place and update/ES6-ify the related code. Finally, we replace `setAttributeNS` with `setAttribute` because no namespace is provided.	2017-09-09 21:36:27 +02:00

1 2 3 4 5 ...

1617 Commits