When the parser finds a stream, it retrieves the Length from the stream
dictionary and advances the lexer to the offset as specified in Length.
If this Length is incorrect, the lexer could end up anywhere.
When the lexer gets in an invalid state, it could throw errors. For
example, in issue 6108, the lexer ends up inside the stream data. This
stream has the ASCIIHexDecode filter, so all characters are made up from
ASCII characters, and the lexer interprets it as a command token. Tokens
cannot be longer than 127 bytes, so eventually 128 bytes are consumed
and the lexer throws "Command token too long" error.
Another possible error is "Illegal character: 41" when the lexer happens
to end up at a ')' due to the length mismatch.
These problems are solved by catching lexer errors and recovering the
parser via the existing stream length detection branch.
Xref offsets are relative to the start of the PDF data, not to the start
of the PDF file. This is clear if you look at the other code:
- In the XRef's readXRefTable and processXRefTable methods of XRef, the
offset of a xref entry is set to the bytes as given by a PDF file.
These values are always relative to the start of the PDF file (%PDF-).
- The XRef's readXRef method adds the start offset of the stream to
Xref entry's offset: "stream.pos = startXRef + stream.start".
Clearly, this line assumes that the entry offset excludes the start
offset.
However, when the PDF is parsed in recovery mode, the xref table is
filled with entries whose offset is relative to the start of the stream
rather than the PDF file. This is incorrect, and the fix is to subtract
the start offset of the stream from the entry's byte offset.
The manually created PDF file serves as a regression test. It is a valid
PDF, except:
- The integer to point to the start of the xref table and the %%EOF
trailer are missing. This will activate recovery mode in PDF.js
- Some junk was added before the start of the PDF file. This exposes the
bad offset bug.
The PDF specification (cited below) specifies a maximum length of a name
in bytes as a minimal architectural limit. This means that PDF *writers*
should not create names that exceed 127 bytes.
It does not forbid PDF *readers* to accept such names though. These
names are only used internally to link PDF objects to other objects. For
these use cases, the lengths of the names do not really matter. Hence I
have changed the implementation to not treat long names as errors, but
warnings.
> (7.3.5) The length of a name shall be subject to an implementation
> limit; see Annex C.
>
> (Annex C.2) Table C.1 describes the minimum architectural limits that
> should be accommodated by conforming readers running on 32-bit
> machines. Because conforming readers may be subject to these limits,
> conforming writers producing PDF files should remain within them.
>
> (Table C.1) name 127 "Maximum length of a name, in bytes."
http://adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/PDF32000_2008.pdf
For named destinations that are contained in a `Dict`, as opposed to a `NameTree`, we currently iterate through the *entire* dictionary just to fetch *one* destination.
This code appears to simply have been copy-pasted from the `get destinations` method, but in its current form it's quite unnecessary/inefficient since can just get the required destination directly instead.
Doing this helped uncover an issue with the `getDestination` implementation.
Currently if a named destination doesn't exist, the method (in `obj.js`) may return `undefined` which leads to the promise being stuck in a pending state.
*Note:* returning `null` for this case is consistent with other methods, e.g. `getOutline` and `getAttachments`.
This became obsolete in bdeca30fbf. All it does is call the Annotation contructor and add hasHtml. This patch lets the Link and Text annotations directly extend the Annotation class and add hasHtml themselves.
This patch also removes an unused global.
Recently I've landed a number patches which fixed issues with ColorSpaces. In most of these cases the cause of the failures were, either partially or entirely, related to the fact that we didn't resolve indirect objects (i.e. the code was missing `xref.fetchIfRef(...)`).
The purpose of this patch is to fix the few remaining cases where indirect objects *could* potentially cause failures.
Given that we have seen how this causes failures in practice, I thus think that it makes sense to try and avoid further issues, instead of waiting for users to file even more bugs for this part of the code-base.
Currently non-embedded ArialBlack fonts are not rendered bold enough, compared to e.g. Adobe Reader.
The issue is that we set the font weight to `bolder`, but since that is actually relative to the font weight of the parent, the result is that there's no practical difference from just using `bold`.
This patch attempts to address that, by explicitly setting the font weight to the maximum value instead (see https://developer.mozilla.org/en-US/docs/Web/CSS/font-weight).
*Note:* I expect one test "failure" in `issue5801`, which in this case is an improvement, since that PDF file uses ArialBlack.
Fixes 6068.
The most notable issue with the font in question is that the `differences` array contains lots of strange entries (of the type `uniXXXX`, instead of proper glyph names).
The 'Version' field of the most recent document catalog, if present, is
intended to supersede the value in the file prologue.
This is significant for incrementally-built PDF documents and generators that
emit a low version in the prologue and later apply a format version based on
PDF features used, such as Apple's CoreGraphics/Quartz PDF backend.
Fixes the internal version variable, as well as the PDFFormatVersion reported
by the API and consumed by viewers.
For passwords where the encoding already is correct, the conversion is a no-op.
Also, since `encodeURIComponent` might throw, we need to make sure that we handle that case too.
Fixes 6010.
Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1050040.
With this patch the file is completely readable, but given that the font is broken enough to be rejected by OTS the rendering differs slightly from Adobe Reader.
*Note:* the PDF file is sufficiently broken that even Adobe Reader complains about the font, *and* also about another more general issue.
Instead of trying to hack around various browser defects, let's just disable PresentationMode in the affected browsers. This patch:
- Disables PresentationMode in IE11+ when the viewer is embedded; fixes 4711.
Set transformation matrix in (polyfilled) mozPrintCallback when a scale
is applied. Removed _scaleX and _scaleY in favor of _transformMatrix to
emphasize that the caller MUST ensure that the state of the matrix is
correct before `addContextCurrentTransform` is called.
Currently if a font contains a `toUnicode` entry, we always create a new `ToUnicodeMap` in evaluator.js. This is done even for `IdentityV/IdentityH`, despite to possibility to use the much more compact `IdentityToUnicodeMap` representation.
This patch refactors the `IdentityH/IdentityV` cases, to:
- Avoid calling `IdentityCMap.getMap`, since this prevents allocating and iterating through an array with 65536 elements.
- Ensure that the handling of `toUnicode` is actually correct in fonts.js.
We rely on `toUnicode instanceof IdentityToUnicodeMap` in a few places, and currently this does not work correctly for `IdentityH/IdentityV`.
This is a very small follow-up to PR 5536, which sets `isStandardFont` to `false` instead of `undefined` (as currently happens for some font names).
Since the patch is so small, I hope it's OK to also fix an unrelated copy-and-paste error in a comment that was added in PR 5260.
This change does the following:
* Address TODO to remove getEmptyContainer helper.
* Not set container bg-color. The old code is incorrect, causing it to
not have any effect. It sets color to an array (item.color) rather
css string. Also in most cases it would set it to black background
which is incorrect.
* only add border instructions when there is actually a border
* reduce memory consumption by not creating new 3 element arrays for
annotation colors. In fact according to spec, this would be incorrect,
as the default should be "transparent" for an empty array. Adobe
Reader interprets a missing color array as black however.
Note that only Link annotations were actually setting a border style and
color. While Text annotations might have calculated a border they did
not color it. This behaviour is now controlled by the boolean flag.
According to practical experiments, falling back to "Helvetica" when we encounter a non-embedded "[Century Gothic](http://en.wikipedia.org/wiki/Century_Gothic)" `CIDFontType2` font seems to work well.
(Also, the section on Wikipedia about "Printer ink usage" *might* provide some anecdotal evidence that Century Gothic is a fairly standard sans-serif font.)
Obviously this patch doesn't make "Century Gothic" fonts render perfectly, as is often the case with non-embedded fonts, but all the text is now legible in the referenced issues.
Fixes 4722.
Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=879561.