There's a number of issues with the fonts in the referenced PDF file. First of all, they contain broken `ToUnicode` data (`NUL` bytes all over the place). However even if you skip those, the `ToUnicode` data appears to contain nothing but a `IdentityH` CMap which won't help provide a proper glyph mapping.
The real issue actually turns out to be that the PDF file uses the "Calibri" font[1], but doesn't include any font files. Since that one isn't a standard font, and uses a fairly different CID to GID map compared to the standard fonts, we're not able to render the file even remotely correct.
To work around this, I'm thus proposing that we include a (incomplete) glyph map for Calibri, and fallback to the standard Helvetica font. Obviously this isn't going to look perfect, but it's really the best that we can hope to achieve given that the PDF file is missing the necessary font data.
Finally, please note that none of the PDF readers I've tried (Adobe Reader, PDFium in Chrome) were able to extract the text (which isn't very surprising, given the broken `ToUnicode` data).
Fixes 9195.
---
[1] According to Wikipedia, see https://en.wikipedia.org/wiki/Calibri, Calibri is (primarily) a Windows font.
For sufficiently large page sizes, always limiting the minimum zoom level to 25% seem a bit too high. One example is pages, e.g. the first one, in:
Hence I think that it makes sense to lower `MIN_SCALE` slightly, since other PDF viewers (e.g. Adobe Reader) isn't limiting the minimum zoom level as aggressively. Obviously this will allow a greater number of pages to be visible at the same time in the viewer, but given that they will be small that shouldn't be an issue.
Note also that e.g. the `page-fit`/`page-width` zoom levels already allow `< MIN_SCALE` values, so I don't see why we shouldn't allow users the same functionality directly.
In some fonts, the included `ToUnicode` data is incomplete causing text-selection to not work properly. For simple fonts that contain encoding data, we can manually build a `ToUnicode` map to attempt to improve things.
Please note that since we're currently using the `ToUnicode` data during glyph mapping, in an attempt to avoid rendering regressions, I purposely didn't want to amend to original `ToUnicode` data for this text-selection edge-case.
Instead, I opted for the current solution, which will (hopefully) give slightly better text-extraction results in PDF file with incomplete `ToUnicode` data.
According to the PDF specification, see [section 9.10.2](http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G8.1873172):
> A conforming reader can use these methods, in the priority given, to map a character code to a Unicode value.
> ...
Reading that paragraph literally, it doesn't seem too unreasonable to use *different* methods for different charcodes.
Fixes 8229.
This was added, during the refactoring in PR 8556, to avoid outright breaking third-party users of the default viewer.
With PDF.js version `2.0`, where we're making API changes that aren't backwards compatible, we ought to be able to remove this piece of viewer code as well.
Split the existing `WebGLUtils` in two classes, a private `WebGLUtils` and a public `WebGLContext`, and utilize the latter in the API to allow various code to access the methods of `WebGLUtils`
We've never attempted to limit the errors displayed in the default viewer to the current PDF file, but that's not really been a problem before. However after PR 7926, it's now possible to get password related error messages for *previously* opened PDF files in the default viewer.
**STR:**
1. Open a password protected PDF file, e.g. `issue6010_1.pdf` from the test-suite.
2. Cancel the password prompt.
3. Open any new PDF file in the viewer.
**AR:**
The error UI is displayed, with a `No password given` message.
**ER:**
No error displayed, since it's only relevent for a now closed PDF file.
This is obviously a minor issue, caused by us now rejecting the still pending `pdfLoadingTask` during the `PDFViewerApplication.close` call, but I don't think that it (generally) makes sense to show errors if they're not relevant to the *currently* displayed PDF file.
adding commas each in two sentences in order to make readers fully understand when reading index.md and to organize the context properly based on grammatical rules.