pdf.js

Author	SHA1	Message	Date
Jonas Jenwald	44b75c01a1	Check that Type1C fonts does not actually contain OpenType font files (issue 7598) This patch is yet another instalment in the (never ending) series of patches for PDF files that specify completely incorrect Type/Subtype for its fonts. In this case Type1/Type1C, when in fact OpenType would have been correct. Fixes 7598.	2016-09-06 10:13:11 +02:00
Jonas Jenwald	37998076c9	In `display/api.js` ensure that we always reject with an `Error` in `JpegDecode`, and adjust a couple of other rejection sites as well In the case where the document was destroyed, we were rejecting the `Promise` in `JpegDecode` with a string instead of an `Error`. The patch also brings the wording more inline with other such rejections. Use the `isInt` utility function when validating the `pageNumber` parameter in `WorkerTransport_getPage`, to make it more obvious what's actually happening. There's also a couple more unit-tests added, to ensure that we always fail in the expected way. Finally, we can simplify the rejection handling in `WorkerTransport_getPageIndexByRef` somewhat. (Note that the only reason for using `catch` here is that since the promise is rejected on the worker side, the `reason` becomes a string instead of an `Error` which is why we "re-reject" on the display side.)	2016-09-05 16:35:32 +02:00
Tim van der Meij	d03651efff	Merge pull request #7407 from Snuffleupagus/issue-7406 Assign the `quantizationTables` after parsing the entire JPEG image, to prevent issues when the DQT (Define Quantization Tables) marker is encountered after SOF{n} (Start of Frame) markers (issue 7406)	2016-09-04 14:49:01 +02:00
Tim van der Meij	6bb95e3129	Merge pull request #7539 from jeremypress/fairexpand [api-minor] Expanding divs to improve selection	2016-09-01 17:43:31 +02:00
Jeremy Press	1ceeb4d17b	added text enhancement regression tests	2016-08-31 09:54:52 -07:00
Jeremy Press	6faa84abdb	Continuing fairexpand #6663 1. Expanding divs to improve text selection. (Yury) 2. Adding enhanceTextSelection as an option. 3. Moving feature functionality from text_layer_builder.js to text_layer.js. 4. Added expandTextDivs method to only load expanded divs on first click, and only show on subsequent clicks	2016-08-31 09:54:52 -07:00
Jonas Jenwald	3ac23200ba	Add a reduced test-case for issue 7406 The PDF file contains an image that we're allowed to use, since it's just the PDF.js logo. The logo image was simply inverted (so that it requires a /Decode entry in the image dictionary that triggers the use of `jpg.js` instead of the browser), converted to JPEG, and finally edited by hand to change the order of the DQT/SOF{n} markers.	2016-08-31 18:42:07 +02:00
Yury Delendik	ffa99397ad	Merge pull request #7387 from Snuffleupagus/issue-5808 Attempt to ignore multiple identical Tf (setFont) commands in `PartialEvaluator_getTextContent` (issue 5808)	2016-08-30 15:21:41 -05:00
Tim van der Meij	f520616e00	Merge pull request #7570 from Snuffleupagus/issue-7569 Create a fallback annotation `id` for entries in `Annots` dictionaries that are not indirect objects (issue 7569)	2016-08-28 00:23:59 +02:00
Tim van der Meij	b81d661556	Remove unused globals from fonts unit test file	2016-08-27 23:20:03 +02:00
Jonas Jenwald	088ce6c009	Add a unit-test to check that `ProblematicCharRanges` contains valid entries When adding new entries to `ProblematicCharRanges`, you have to be careful to not make any mistakes since that could cause glyph mapping issues. Currently the existing reference tests should probably help catch any errors, but based on experience I think that having a unit-test which specifically checks `ProblematicCharRanges` would be both helpful and timesaving when modifying/reviewing changes to this code. Hence this patch which adds a function (and unit-test) that is used to validate the entries in `ProblematicCharRanges`, and also checks that we don't accidentally add more character ranges than the Private Use Area can actually contain. The way that the validation code, and thus the unit-test, is implemented also means that we have an easy way to tell how much of the Private Use Area is potentially utilized by re-mapped characters.	2016-08-27 11:56:00 +02:00
Jonas Jenwald	78889646c8	Create a fallback annotation `id` for entries in `Annots` dictionaries that are not indirect objects (issue 7569) According to the PDF specification, see http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#page=86, entries in `Annots` dictionaries should be indirect objects, but obviously there're PDF generators that ignore this. Fixes 7569.	2016-08-27 10:56:16 +02:00
Jonas Jenwald	db1526c59e	Add unit-tests for asynchronous methods in `primitives.js` In PR 7520, I missed the fact that we currently have no unit-tests for `Dict_getAsync`.	2016-08-21 18:44:58 +02:00
Tim van der Meij	b4c8814fc9	Merge pull request #7534 from Snuffleupagus/isName-name-check Add a parameter to the `isName` function that enables checking not just that something is a `Name`, but also that the actual `name` properties matches	2016-08-17 15:48:42 +02:00
Jonas Jenwald	544d29f5cb	Add a `recoveryMode` that suppresses errors from the `Parser`, and utilize it when searching for the main trailer in `XRef_indexObjects` (bug 1250079) Instead of having `Parser_getObj` fail unconditionally for the referenced PDF file, this patch attempts to let searching for the main trailer continue even if there are errors. Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1250079.	2016-08-17 12:37:35 +02:00
Jonas Jenwald	af636aae96	Add a parameter to the `isName` function that enables checking not just that something is a `Name`, but also that the actual `name` properties matches This is similar to the existing `isCmd` and `isDict` functions, which already support similar kind of checks. With the updated `isName` function, we'll be able to simplify many callsites from: `isName(someVariable) && someVariable.name === 'someName'` to: `isName(someVariable, 'someName')`.	2016-08-10 11:15:03 +02:00
Jonas Jenwald	d70e07fb90	Add more unit-tests for `primitives.js`	2016-08-03 17:04:12 +02:00
Jonas Jenwald	77c6ed5389	Attempt to ignore multiple identical Tf (setFont) commands in `PartialEvaluator_getTextContent` (issue 5808) This patch improves the performance of issue 5808, but I'm not sure if it's enough to call it fixed. On average, this patch reduces the number of textLayer div's by a factor of 3, and it also reduces the time spend in `getTextContent` by a factor of ~2. The PDF file is generated by `Scribus PDF`, which for reasons I cannot understand is placing redundant `Tf` commands before every showText command. Note how the PDF file also contains lots of (basically) identical fonts, but with slightly different names, which causes unnecessary font-switching. This causes some unnecessary breaking of textLayer div's, but this issue cannot be easily worked around.	2016-07-27 21:37:52 +02:00
Yury Delendik	a02e2686b9	Merge pull request #7475 from Snuffleupagus/api-getTextContent-combineTextItems [api-minor] Add a parameter to `PDFPageProxy_getTextContent` that controls whether `PartialEvaluator_getTextContent` will attempt to combine same line text items	2016-07-27 08:34:24 -05:00
Jonas Jenwald	558a22cd02	Prevent errors when parsing Annotations with missing (or invalid) /Subtype entries (issue 7446) Note that I used a separate warning message for this case, instead of utilizing the same one as in the unsupported subtype case, to more clearly indicate that the PDF file itself is to blame rather than PDF.js. Fixes 7446.	2016-07-25 13:59:26 +02:00
Brendan Dahl	5678486802	Merge pull request #7347 from Snuffleupagus/evaluator-more-Ref_toString Slightly refactor the `fontRef` handling in `PartialEvaluator_loadFont` (issue 7403 and issue 7402)	2016-07-22 17:21:47 -07:00
Brendan Dahl	50d6e4f147	Merge pull request #7447 from Snuffleupagus/buildToUnicode-notdef Ignore .notdef in the `differences` array when building a fallback `toUnicode` map in `PartialEvaluator_buildToUnicode` (issue 5256)	2016-07-22 14:33:32 -07:00
Jonas Jenwald	4fe891c5e7	Add a reduced test-case for issue 7403	2016-07-21 16:04:07 +02:00
Tim van der Meij	10f9f11ec4	Merge pull request #7490 from Snuffleupagus/issue-7426 Don't map glyphs to the Lepcha Unicode block (issue 7426)	2016-07-21 14:39:19 +02:00
Jonas Jenwald	f297e4d17c	[api-minor] Add a parameter to `PDFPageProxy_getTextContent` that controls whether `PartialEvaluator_getTextContent` will attempt to combine same line text items From the discussion in issue 7445, it seems that there may be cases where an API consumer would want to get the text content as is, without combined text items.	2016-07-19 13:38:57 +02:00
Jonas Jenwald	90d19de935	Catch errors and continue parsing in `parseCMap` (issue 7492) After PR 7039, the PDF file in issue 7492 no longer renders at all, but note that text selection wasn't working correctly previously. The problem with the PDF file in issue 7492 is that the `cMap`, in the `toUnicode` entry in the font, contains an invalid name: ``` /CMapName /-usr-share-fonts-truetype-Panton-Panton Family-Fontfabric - Panton.otf,000-UTF16 def ``` When we parse that line, things obviously break because there are spaces present in the wrong places. To avoid that issue, the patch simply lets `parseCMap` continue when errors are encountered, to try and recover usable data. Note that by not aborting immediatly when an error is encountered, we are also able to fix the text selection. Obviously, it could be argued that we should just immediatly reject a corrupt `cMap`. But given that they usually are correct, it seems that trying to recover as much data as possible from corrupt one can only be a good thing for both glyph mapping and text selection. Fixes 7492.	2016-07-18 16:39:56 +02:00
Jonas Jenwald	64783c8b6e	Don't map glyphs to the Lepcha Unicode block (issue 7426) In the PDF file in the issue, some of the glyphs end up being mapped to the Lepcha Unicode block; see https://en.wikipedia.org/wiki/Lepcha_(Unicode_block). This didn't use to matter, but after HarfBuzz updates that improved support for Lepcha fonts, in particular https://bugzilla.mozilla.org/show_bug.cgi?id=1249861, some glyphs are now moved horizontally. To avoid that, this patch adds the Lepcha block to the list of Unicode ranges that we skip when building the glyph mapping. Fixes 7426.	2016-07-17 16:53:36 +02:00
klemens	6f03f62327	trivial spelling fixes	2016-07-17 14:33:41 +02:00
Brendan Dahl	1f3f4a8dd7	Merge pull request #7441 from Snuffleupagus/issue-7439 Fallback to attempt to recover standard glyph names when amending the `charCodeToGlyphId` with entries from the `differences` array in `type1FontGlyphMapping` (issue 7439)	2016-07-06 13:02:21 -07:00
Jonas Jenwald	72c1df726e	Add a `getAttachments` unit-test for a PDF file that actually contains attachments	2016-07-02 13:13:30 +02:00
Brendan Dahl	e2e657e44f	Merge pull request #7390 from Snuffleupagus/issue-7180 Add upper-case `I` as a possible space replacement fallback in `Font.spaceWidth` to improve text-selection (issue 7180)	2016-06-29 15:11:19 -07:00
Jonas Jenwald	bdd58ab1d2	Ignore .notdef in the `differences` array when building a fallback `toUnicode` map in `PartialEvaluator_buildToUnicode` (issue 5256) Fixes 5256.	2016-06-27 16:20:23 +02:00
Jonas Jenwald	7866109af9	Fallback to attempt to recover standard glyph names when amending the `charCodeToGlyphId` with entries from the `differences` array in `type1FontGlyphMapping` (issue 7439) Fixes 7439.	2016-06-25 14:54:34 +02:00
Tim van der Meij	f97d52182a	Merge pull request #7341 from Snuffleupagus/getDestinationHash-Array [api-minor] Improve handling of links that are using explicit destination arrays	2016-06-09 00:29:10 +02:00
Jonas Jenwald	6a0b047bfa	Add upper-case `I` as a possible space replacement fallback in `Font.spaceWidth` to improve text-selection (issue 7180) In fonts with only upper-case glyphs, that are also missing a space glyph, `get spaceWidth` won't be able to return anything useful. By adding upper-case `I` as a fallback, we can thus improve text-selection in some PDF files. Note that locally, the patch causes slight movement in a few existing `text` tests, but in my opinion this actually looks like slight improvements. Fixes 7180.	2016-06-07 22:55:25 +02:00
Jonas Jenwald	6260fc09a3	Attempt to recover valid `format 3` FDSelect data from broken CFF fonts (bug 1146106) According to the CFF specification, see http://partners.adobe.com/public/developer/en/font/5176.CFF.pdf#G3.46884, for `format 3` FDSelect data: "The first range must have a ‘first’ GID of 0". Since the PDF file (attached in the bug) violates that part of the specification, this patch tries to recover valid FDSelect data to prevent OTS from rejecting the font. Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1146106.	2016-06-06 18:20:52 +02:00
Yury Delendik	f0585f5d65	Merge pull request #7370 from Rob--W/crx-telemetry-7312 Add opt-out telemetry to the Chrome extension	2016-06-03 15:48:46 -05:00
Rob Wu	724308c57a	Add opt-out telemetry to the Chrome extension Privacy policy: https://github.com/Rob--W/pdfjs-telemetry#privacy-policy Unit tests (offline): ``` node test/chromium/test-telemetry.js ``` Server tests (requires that Nginx is installed): ``` git clone https://github.com/Rob--W/pdfjs-telemetry.git cd pdfjs-telemetry/ python testserver.py TestHttp TestHttps ``` Integration test (extension + server): - Build the extension - Edit build/chromium/telemetry.js and remove the check for chrome.runtime.id. - Start Chrome (preferably a new profile): chromium --user-data-dir=/tmp/pdftest --no-first-run - Open chrome://net-internals#events - Visit chrome://extensions and enable Developer mode. - Load unpacked extension, select build/chromium. - Go to the chrome://net-internals tab and filter on pdfjs.robwu.nl. - Click on URL_REQUEST and verify that the server replied with 204. - Reload the extension. - Verify that chrome://net-internals did not contain a new log request.	2016-06-03 20:36:57 +02:00
Jonas Jenwald	b02d560ae0	Fix errors in `setGState` in `PartialEvaluator_getTextContent` that prevents text-selection from working properly Currently `setGState` is completely broken, and looking through the history of that code, it seems to me that this may never have worked correctly. This patch fixes the text-selection in `extgstate.pdf` in the test-suite, which is also added as a `text` test.	2016-06-01 22:58:49 +02:00
Jonas Jenwald	98fe094d18	Let non-viewable Popup Annotations inherit the parent's Annotation Flags if the parent is viewable Fixes http://www.pdf-archive.com/2013/09/30/file2/file2.pdf. Note how it's not possible to show the various Popup Annotations in the above document. To fix that, this patch lets the Popup inherit the flags of the parent, in the special case where the parent is `viewable` and the Popup is not. In general, I don't think that a Popup must have the same flags set as the parent. However, it seems very strange to have a `viewable` parent annotation, and then not being able to view the Popup. Annoyingly the PDF specification doesn't, as far as I can find, mention anything about how this case should be handled, but this patch seem consistent with the actual behaviour in Adobe Reader.	2016-05-25 23:00:26 +02:00
Brendan Dahl	b86610ffdb	Merge pull request #7300 from Snuffleupagus/bug-1068432 Prevent adding invalid values in `CFFDict_setByKey` (bug 1068432)	2016-05-24 12:12:38 -07:00
Jonas Jenwald	b354682dd6	[api-minor] Let `LinkAnnotation`/`PDFLinkService_getDestinationHash` return a stringified version of the destination array for explicit destinations Currently for explicit destinations, compared to named destinations, we manually try to build a hash that often times is a quite poor representation of the actual destination. (Currently this only, kind of, works for `\XYZ` destinations.) For PDF files using explicit destinations, this can make it difficult/impossible to obtain a link to a specific section of the document through the URL. Note that in practice most PDF files, especially newer ones, use named destinations and these are thus unnaffected by this patch. This patch also fixes an existing issue in `PDFLinkService_getDestinationHash`, where a named destination consisting of only a number would not be handled correctly. With the added, and already existing, type checks in place for destinations, I really don't think that this patch exposes any "sensitive" internal destination code not already accessible through normal hash parameters. Please note: Just trying to improve the algorithm that generates the hash is unfortunately not possible in general, since there are a number of cases where it will simply never work well. - First of all, note that `getDestinationHash` currently relies on the `_pagesRefCache`, hence it's possible that the hash returned is empty during e.g. ranged/streamed loading of a PDF file. - Second of all, the currently computed hash is actually dependent on the document rotation. With named destinations, the fetched internal destination array is rotational invariant (as it should be), but this will not hold in general for the hash. We can easily avoid this issue by using a stringified destination array. - Third of all, note that according to the PDF specification[1], `GoToR` destinations may actually contain explicit destination arrays. Since we cannot really construct a hash in `annotation.js`, we currently have no good way to support those. Even though this case seems very rare in practice (I've not actually seen such a PDF file), it's in the specification, and this patch allows us to support that for "free". --- [1] http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G11.1951685	2016-05-21 14:14:07 +02:00
Jonas Jenwald	01ab15a6f1	[api-minor] Let `Catalog_getPageIndex` check that the `Ref` actually points to a /Page dictionary Currently the `getPageIndex` method will happily return `0`, even if the `Ref` parameter doesn't actually point to a proper /Page dictionary. Having the API trust that the consumer is doing the right thing seems error-prone, hence this patch which adds a check for this case. Given that the `Catalog_getPageIndex` method isn't used in any hot part of the codebase, this extra check shouldn't be a problem. (Note: in the standard viewer, it is only ever used from `PDFLinkService_navigateTo` if a destination needs to be resolved during document loading, which isn't common enough to be an issue IMHO.)	2016-05-21 14:13:41 +02:00
Tim van der Meij	db46829ef7	Merge pull request #7316 from timvandermeij/remove-unused Remove unused variables	2016-05-21 14:07:33 +02:00
Jonas Jenwald	c5c5a2a71f	Add basic unit-tests for unicode.js Re: issue 7261.	2016-05-19 19:45:45 +02:00
Jonas Jenwald	7ddb0bc718	Attempt to combine text runs positioned with `setTextMatrix`	2016-05-18 17:21:58 +02:00
Tim van der Meij	6a7012aaca	Remove unused variables These have been found using `gulp lint` in combination with the `unused: true` parameter for JSHint. Unfortunately there are too many false positives to enable this feature, but now that most globals have been removed because of the conversion to UMD the results are much more useful than before.	2016-05-11 16:11:13 +02:00
Jonas Jenwald	182d33800a	Ignore 'endobj' commands inside of `ObjStm` streams (issue 5241, bug 898610, bug 1037816) According to an example in the PDF specification, see http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#page=56, an `ObjStm` stream should not contain 'endobj' commands. Fixes 5241. Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=898610. Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1037816.	2016-05-09 09:50:45 +02:00
Jonas Jenwald	c9b6de3b16	Prevent adding invalid values in `CFFDict_setByKey` (bug 1068432) In the font in question, there are a couple of `topDict` entries that have invalid values (`0xF 0xF`, i.e. just eof markers without any actual numbers). This causes the `parseFloatOperand` function, inside `CFFParser_parseDict`, to return `NaN`. Currently we pass this broken font onto the browser, which OTS unsurprisingly rejects. Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1068432.	2016-05-07 21:09:58 +02:00
Jonas Jenwald	29c4a604af	Split the font_spec.js unit-tests into cff_parser_spec.js and type1_parser_spec.js Re: issue 7261. Given the we have `gulp fonttest`, which tests the `fonts.js` functionality at a higher level, and that we have a lot of font specific reference tests, I'm not convinced that we also need unit-tests for it.	2016-05-03 09:37:36 +02:00

1 2 3 4 5 ...

1403 Commits