Fallback to attempt to recover standard glyph names when amending the `charCodeToGlyphId` with entries from the `differences` array in `type1FontGlyphMapping` (issue 7439)
In fonts with only upper-case glyphs, that are also missing a space glyph, `get spaceWidth` won't be able to return anything useful.
By adding upper-case `I` as a fallback, we can thus improve text-selection in some PDF files.
Note that locally, the patch causes slight movement in a few existing `text` tests, but in my opinion this actually looks like slight improvements.
Fixes 7180.
Currently the `isSpace` utility function is a member of `Lexer`, which seems suboptimal, given that it's placed in `core/parser.js`. In practice, this means that in a number of `core/*.js` files we thus have an *otherwise* completely unnecessary dependency on `core/parser.js` for a one-line function.
Instead, this patch moves `isSpace` into `shared/util.js` which seems more appropriate for this kind of utility function. Not to mention that since all the affected `core/*.js` files already depends on `shared/util.js`, this doesn't incur any more file dependencies.
Currently `setGState` is completely broken, and looking through the history of that code, it seems to me that this may never have worked correctly.
This patch fixes the text-selection in `extgstate.pdf` in the test-suite, which is also added as a `text` test.
Fixes http://www.pdf-archive.com/2013/09/30/file2/file2.pdf.
Note how it's not possible to show the various Popup Annotations in the above document.
To fix that, this patch lets the Popup inherit the flags of the parent, in the special case where the parent is `viewable` *and* the Popup is not.
In general, I don't think that a Popup must have the same flags set as the parent. However, it seems very strange to have a `viewable` parent annotation, and then not being able to view the Popup.
Annoyingly the PDF specification doesn't, as far as I can find, mention anything about how this case should be handled, but this patch seem consistent with the actual behaviour in Adobe Reader.
Currently for explicit destinations, compared to named destinations, we manually try to build a hash that often times is a quite poor representation of the *actual* destination. (Currently this only, kind of, works for `\XYZ` destinations.)
For PDF files using explicit destinations, this can make it difficult/impossible to obtain a link to a specific section of the document through the URL.
Note that in practice most PDF files, especially newer ones, use named destinations and these are thus unnaffected by this patch.
This patch also fixes an existing issue in `PDFLinkService_getDestinationHash`, where a named destination consisting of only a number would not be handled correctly.
With the added, and already existing, type checks in place for destinations, I really don't think that this patch exposes any "sensitive" internal destination code not already accessible through normal hash parameters.
*Please note:* Just trying to improve the algorithm that generates the hash is unfortunately not possible in general, since there are a number of cases where it will simply never work well.
- First of all, note that `getDestinationHash` currently relies on the `_pagesRefCache`, hence it's possible that the hash returned is empty during e.g. ranged/streamed loading of a PDF file.
- Second of all, the currently computed hash is actually dependent on the document rotation. With named destinations, the fetched internal destination array is rotational invariant (as it should be), but this will not hold in general for the hash. We can easily avoid this issue by using a stringified destination array.
- Third of all, note that according to the PDF specification[1], `GoToR` destinations may actually contain explicit destination arrays. Since we cannot really construct a hash in `annotation.js`, we currently have no good way to support those. Even though this case seems *very* rare in practice (I've not actually seen such a PDF file), it's in the specification, and this patch allows us to support that for "free".
---
[1] http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G11.1951685
Currently the `getPageIndex` method will happily return `0`, even if the `Ref` parameter doesn't actually point to a proper /Page dictionary.
Having the API trust that the consumer is doing the right thing seems error-prone, hence this patch which adds a check for this case.
Given that the `Catalog_getPageIndex` method isn't used in any hot part of the codebase, this extra check shouldn't be a problem.
(Note: in the standard viewer, it is only ever used from `PDFLinkService_navigateTo` if a destination needs to be resolved during document loading, which isn't common enough to be an issue IMHO.)
These have been found using `gulp lint` in combination with the `unused:
true` parameter for JSHint. Unfortunately there are too many false
positives to enable this feature, but now that most globals have been
removed because of the conversion to UMD the results are much more
useful than before.
In the font in question, there are a couple of `topDict` entries that have invalid values (`0xF 0xF`, i.e. just eof markers without any actual numbers).
This causes the `parseFloatOperand` function, inside `CFFParser_parseDict`, to return `NaN`. Currently we pass this broken font onto the browser, which OTS unsurprisingly rejects.
Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1068432.
As evident from e.g. PRs 6485 and 7118, some bad PDF generators unfortunately create Arrays where *some* elements are indirect objects (i.e. `Ref`s). This seems to mostly affect Arrays that contain numbers, such as e.g. `Matrix/FontMatrix/BBox/FontBBox/Rect/Color/...`, and has manifested itself in PDF files that fail to render correctly (some elements are missing).
The problem in both the cases above, besides broken rendering, was that there were *no* errors/warnings that indicated what the problem was, making it difficult to pinpoint the issue.
Hence this patch, where I've audited all usages of `Dict_get` in `src/core/` files, and replaced it with `Dict_getArray` where appropriate to try and prevent unnecessary future bugs.
- Add support for the 'NewWindow' property.
- Ensure that destinations are applied to the *remote* document, instead of the current one.
- Handle the `F` entry being a standard string, instead of a dictionary.
Using `new {Name,Cmd}` should be avoided, since it creates a new object on *every* call, whereas `{Name,Cmd}.get` uses caches to only create *one* object regardless of how many times they are called.
Most of these are found in the unit-tests, where increased memory usage probably doesn't matter very much. But it still seems good to get rid of those cases, since no part of the codebase ought to advertise that usage.
Given the small size of the patch, I'm also tweaking a few comments and class names.
Note that in the PDF files provided by the reporter, this issue was limited to `Rect` arrays in AcroForm entries (which we currently don't support).
However, since a bad PDF generator could create this problem in *any* kind of annotation, the reduced test-case included here uses a simple LinkAnnotation instead.
Fixes 7115.
According to "The table directory" under https://developer.apple.com/fonts/TrueType-Reference-Manual/RM06/Chap6.html#Directory, TrueType font tables should have `uint32` checksums.
This is something that I noticed, and was initially confused about, while debugging a TrueType issue.
As far as I can tell, the current (`int32`) checksums we use doesn't cause any issues in practice. However, I do think that this should be addressed to agree with the specification, and to reduce possible confusion when reading the font code.
Currently there's a lot of duplicate code for non-embedded `toFontChar`, which this patch simplifies by extracting the code into a helper function instead.
This patch adds a `getUnicodeForGlyph` helper function, which is used to recover Unicode values for non-standard glyph names.
Some PDF generators, e.g. Scribus PDF, use improper `uniXXXX` glyph names which breaks the glyph mapping. We can avoid this by converting them to "standard" glyph names instead.
Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1132849.
Fixes 6893.
Fixes 6894.
In the PDF file in question, some of the 'name' table entries have `record.length === 0`. This becomes problematic in the non-unicode case, since `font.getBytes(0)` will fetch the *entire* stream.
Given that OTS rejects 'name' entries larger than `2^16`, this thus explain the sanitizer errors.
Fixes 7020.
*A more robust solution for issue 6066.*
As a temporary work-around for (the upstream) [bug 1164199](https://bugzilla.mozilla.org/show_bug.cgi?id=1164199), we parsed *all* images in the Firefox addon during a short time.
Doing so uncovered an issue with our image handling (see 6066), for JPEG images with a `DeviceGray` ColorSpace *and* `bpc !== 1` (bits per component).
As long as we let the browser handle image decoding in this case, this isn't going to be an issue, but I do think that we should proactively fix this to avoid future issues if we change where the images are decoded (in `jpg.js` vs in browser).
Also, we currently don't seem to have a test-case for that kind of image data.
Currently the `C` entry in an outline item is returned as is, which is neither particularly useful nor what the API documentation claims.
This patch also adds unit-tests for both the color handling, and the `F` entry (bold/italic flags).
`Dict_getAll` is problematic for a number of reasons. First of all, as issue 6961 shows, it can be really bad for performance, since it dereferences all indirect objects.
Second of all, all the derefencing can lead to data being unncessarily requested when ranged/chunked loading is used, thus unnecessarily delaying rendering.
Note: For cases where `Dict_getAll` was previously used, `Dict_getKeys` in combination with `Dict_get` can be used instead. This has the advantage that data isn't requested until it's actually needed.
For the operators that we currently support, the arguments are not `Dict`s, which means that it's not really necessary to use `Dict_getAll` in `EvaluatorPreprocessor_read`.
Also, I do think that if/when we support operators that use `Dict`s as arguments, that should be dealt with in the corresponding `case` in `PartialEvaluator_getOperatorList` which handles the operator.
The only reason that I can find for using `Dict_getAll` like that, is that prior to PR 6550 we would just append certain (currently unsupported) operators without doing any further processing/checking. But as issue 6549 showed, that can lead to issues in practice, which is why it was changed.
In an effort to prevent possible issue with unsupported operators, this patch simply ignores operators with `Dict` arguments in `PartialEvaluator_getOperatorList`.
For the `CalGray`/`CalRGB`/`Lab` colour spaces, we're currently using `getAll` to retrieve the parameters. However that's not really necessary, since we may just as well explicitly `get` the needed parameters instead.
Some bad PDF generators, in particular "Scribus PDF", duplicates resources *a lot* at various levels of the PDF files. This can lead to `PartialEvaluator_hasBlendModes` taking an unreasonable amount of time to complete.
The reason is that the current code is using `Dict_getAll`, which recursively dereferences *all* indirect objects, which can be really slow. This patch instead uses `Dict_getKeys`, and then manually looks up only the necessary indirect objects.
I've added the PDF file as a `load` test. The most important thing here is probably to ensure that the file remains available in the repo, and the comment should help reduced the chance of regressions. (Note that locally, the `load` test times out without this patch, but we cannot really assume that that always happens.)
Fixes 6961.
*This patch is based on something I noticed while debugging some of the PDF files in issue 6931.*
In a number of the cases in `setGState`, we're implicitly assuming that we're not dealing with indirect objects (i.e. `Ref`s). See e.g. the 'Font' case, or the various cases where we simply do `gStateObj.push([key, value]);` (since the code in `canvas.js` won't be able to deal with a `Ref` for those cases).
The reason that I didn't use `Dict_forEach` instead, is that it would re-introduce the unncessary closures that PR 5205 removed.
The intention of PR 5192 was to avoid adding empty `setGState` ops to the operatorList. But the patch accidentally used `>=`, which means that it's not actually working as intended, since empty arrays always have `length === 0`.
Even though the currently known test-cases render correctly without this patch, that seems more like a lucky coincidence, given that there's no guarantee that `transferMap[255] === 0` for every possible transfer function.
This patch fixes an issue that I inadvertently introduced in PR 5815, where we accidentally modify the `Differences` array in the encoding dictionary for indirect objects.
Instead of this change, we could also have used the now existing `Dict_getArray`. However in this case I don't think that would have been a good idea, since it would mean iterating through the array *twice*.