pdf.js

History

Jonas Jenwald 61e19bee43 Build a fallback ToUnicode map for simple fonts (issue 8229)

In some fonts, the included `ToUnicode` data is incomplete causing text-selection to not work properly. For simple fonts that contain encoding data, we can manually build a `ToUnicode` map to attempt to improve things.

Please note that since we're currently using the `ToUnicode` data during glyph mapping, in an attempt to avoid rendering regressions, I purposely didn't want to amend to original `ToUnicode` data for this text-selection edge-case.
Instead, I opted for the current solution, which will (hopefully) give slightly better text-extraction results in PDF file with incomplete `ToUnicode` data.

According to the PDF specification, see [section 9.10.2](http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G8.1873172):

> A conforming reader can use these methods, in the priority given, to map a character code to a Unicode value.
> ...

Reading that paragraph literally, it doesn't seem too unreasonable to use *different* methods for different charcodes.

Fixes 8229.

2017-11-26 14:45:15 +01:00

chromium

Fix inconsistent spacing and trailing commas in objects in test/ files, so we can enable the comma-dangle and object-curly-spacing ESLint rules later on

2017-06-02 13:04:04 +02:00

features

Remove usage of mozFillRule

2017-01-29 23:24:44 +01:00

font

Fix inconsistent spacing and trailing commas in objects in test/ files, so we can enable the comma-dangle and object-curly-spacing ESLint rules later on

2017-06-02 13:04:04 +02:00

pdfs

Build a fallback ToUnicode map for simple fonts (issue 8229)