In the rare situation that an optional content dictionary lacks a /Type-entry we currently throw, which may prevent e.g. Form XObjects from rendering completely.
Fixes https://bugs.ghostscript.com/show_bug.cgi?id=707147
The goal is to always have something which is focusable to let the user select
it with the keyboard.
It fixes the mentioned bug because, the annotation layer will now have a container
to attach the canvas for annotations having their own canvas.
When searching for "endobj"-operators, make sure that we don't accidentally match a "trailer"-string in /Content-streams without /Filter-entries (i.e. streams that contain "raw" and thus human-readable data).
When the flag is set, the appearance has to be generated from the value so it's
useless/meaningless to extract the content from the existing appearance.
When a pdf has /NeedAppearances set to true, the annotation appearance must be
generated from its value and we must take into account the hasOwnCanvas property.
- Take into account the page translation,
- Take into account the correct translation for the editor border,
- Take into account the position of the first glyph in the annotation,
- Take into account the rotation of the editor.
Close#16633.
This patch is the result of me going through some old issues regarding non-embedded Wingdings support.
There's a few different things wrong in the referenced PDF document:
- The /BaseFont and /FontName entries don't agree on the name of the fonts, with one font using `/BaseFont /Wingdings-Regular` and `/FontName /wg09np` which obviously makes no sense.
To address this we'll compare the font-names against our lists of known ones and ignore /FontName entries that don't make sense iff the /BaseFont entry is a known font-name.
- The non-embedded Wingdings font also set an incorrect /Encoding, in this case /MacRomanEncoding, which should have been fixed by PR 16465. However this doesn't work since the font has *bogus* font-flags, that fail to categorize the font as Symbolic.
To address this we'll also compare the font-name against the list of known symbol fonts.
Now that font-substitution has been implemented, we should be able to do much a better job at supporting non-embedded Wingdings fonts.
Given that this is a Windows-specific font, see https://en.wikipedia.org/wiki/Wingdings, this is however not guaranteed to work (well) on other platforms.
The affected font is non-embedded ZapfDingbats, however the PDF document for some inexplicable reason specifies the encoding as "WinAnsiEncoding" (which is obviously wrong).
To work-around this bug in the PDF generator, we'll simply ignore any explicitly specified named encoding for non-embedded symbol fonts.
Given that inline images may contain "EI"-sequences in the image-data itself, actually finding the end-of-image operator isn't always straightforward.
Here we extend the implementation from PR 12028 to potentially check all of the following bytes, rather than stopping immediately. While we have fairly decent test-coverage for this code, whenever you're changing it there's unfortunately a slightly higher than normal risk of regressions. (You'd really wish that PDF generators just stop using inline images.)
- if the contours count is lower than -1, the glyph is really likely wrong
so just remove it from the font;
- if a contour has the repeat flag then repeats count mustn't be 0.
According to https://en.wikipedia.org/wiki/Impact_(typeface) this font should be available on all current versions of Windows, and with the recently added font-substitution we should actually be able to render it correctly (at least on Windows).
When fixing bug 1766987, I thought the field formatted value came from
the result of the format callback: I was wrong. The format callback is ran
but the value is unused (maybe it's useful to set some global vars... or
it's just a bug in Acrobat). Anyway the value to display is the one rendered
in the AP stream.
The field value setter has been simplified and that fixes issue #16409.
This essentially extends PR 11218 to also apply when looking up the final font-reference, via the XRef-table, fails because the font isn't available.
This patch also changes `PartialEvaluator.fallbackFontDict` to simply use "Helvetica" as the default font-name, since that seems generally reasonable given the now existing font-substitution code.
The /Decode-implementation in the our JPEG decoder, i.e. `src/core/jpg.js`, seems to only handle *inverting* of images properly. To support arbitrary /Decode-entries correctly we'll always use the `PDFImage.decodeBuffer` method, even for "simple" JPEG images, which should be fine since non-default /Decode-entries aren't a very common occurrence.
*Please note:* This patch will lead to a little bit of movement in some existing test-cases, however it should be virtually imperceivable to the naked eye.
Some arabic chars like \ufe94 could be searched in a pdf, hence it must be normalized
when creating the search query. So to avoid to duplicate the normalization code,
everything is moved in the find controller.
The previous code to normalize text was using NFKC but with a hardcoded map, hence it
has been replaced by the use of normalize("NFKC") (it helps to reduce the bundle size
by 30kb).
In playing with this \ufe94 char, I noticed that the bidi algorithm wasn't taking into
account some RTL unicode ranges, the generated font wasn't embedding the mapping this
char and the unicode ranges in the OS/2 table weren't up-to-date.
When normalized some chars can be replaced by several ones and it induced to have
some extra chars in the text layer. To avoid any regression, when copying some text
from the text layer, a copied string is normalized (NFKC) before being put in the
clipboard (it works like this in either Acrobat or Chrome).
Currently we have two separate image-caches on the worker-thread:
- A local one, which is unique to each `PartialEvaluator.getOperatorList` invocation. This one caches both names *and* references, since image-resources may be accessed in either way.
- A global one, which applies to the entire PDF documents and all its pages. This one only caches references, since nothing else would work.
This patch introduces a third image-cache, which essentially sits "between" the two existing ones. The new `RegionalImageCache`[1] will be usable throughout a `PartialEvaluator` instance, and consequently it *only* caches references, which thus allows us to keep track of repeated image-resources found in e.g. different /Form and /SMask objects.
---
[1] For lack of a better word, since naming things is hard...