The problem with the PDF files in the issue, besides the obviously broken XRef tables which we're able to recover from, is that many/most of the streams have Dictionaries where the `Length` entry is set to `0`. This causes us to return `NullStream`, instead of the appropriate one in `Parser_makeFilter`.
Fixes 6360.
Short story: somebody got lost in two different indices. pi is an index in the stream and is explained on page 198 of the 32000-spec (however 1-based there), and ps is an index to something in PDF.js. I used the code from flag 0 (which works) to understand which is which. It is also important to understand that for flags 1,2 and 3, the stream is always assigned to the same coordinates and colors. What changes is which "old" coordinates and colors are assigned to what is "missing" in the stream. This is why for these flags, the code is identical except for the assignments in the first "row".
This patch refactors the code responsible for setting the annotation's rectangle. Its goal is to:
- Actually check that the input array is actually an array, and if so, that it contains exactly four elements.
- Only call `normalizeRect` if the input array is valid, i.e., we do not call it for the default rectangle anymore.
Unit tests are provided just like with the other patches in this series.
Fixes#6106
To avoid future regressions, two new unit tests were added:
1. A new PDF based on the report from #6106, which contains an
OpenAction of type JavaScript and a string "this.print({...}".
2. An existing PDF from https://bugzil.la/1001080 (from #4698).
Although it does not matter, since we don't execute the JavaScript code,
I have also changed "print(true)" to "print({})" since the print method
takes an object (not a boolean). See "Printing PDF documents", page 62:
http://adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/js_developer_guide.pdf
- Use rimraf instead of a custom removeDirSync implementation - rimraf
deals with edge cases like EPERM on Windows.
- Detect when the process exits before it was requested via stop(),
instead of running the cleanup handler.
- Add fallback for process detection when the process exits before it
was requested. On *nix systems, this is done via pkill and pgrep, on
Windows this is done via wmic.
- Add some asserts to check the preconditions of the methods, and output
some status information to aid debugging in case of failure.
I have verified that these changes work on ArchLinux and Windows XP,
using Chrome and Firefox, as follows:
1. node make unittest
2. node make unittest
3. Restart the Firefox process via the task manager as soon as possible.
4. node make unittest
5. Temporary lock a file/directory within the temporary profile
directory until the tests have finished, and then unlock the file
within 10 seconds.
In all cases, the auxilary browser processes are killed, and the
temporary profile directory is wiped.
Basic mathematics would suggest that a double negative should always become positive, but it appears that Adobe Reader simply ignores that case. Hence I think that it makes sense for us to do the same.
Fixes 6218.
When the parser finds a stream, it retrieves the Length from the stream
dictionary and advances the lexer to the offset as specified in Length.
If this Length is incorrect, the lexer could end up anywhere.
When the lexer gets in an invalid state, it could throw errors. For
example, in issue 6108, the lexer ends up inside the stream data. This
stream has the ASCIIHexDecode filter, so all characters are made up from
ASCII characters, and the lexer interprets it as a command token. Tokens
cannot be longer than 127 bytes, so eventually 128 bytes are consumed
and the lexer throws "Command token too long" error.
Another possible error is "Illegal character: 41" when the lexer happens
to end up at a ')' due to the length mismatch.
These problems are solved by catching lexer errors and recovering the
parser via the existing stream length detection branch.
Xref offsets are relative to the start of the PDF data, not to the start
of the PDF file. This is clear if you look at the other code:
- In the XRef's readXRefTable and processXRefTable methods of XRef, the
offset of a xref entry is set to the bytes as given by a PDF file.
These values are always relative to the start of the PDF file (%PDF-).
- The XRef's readXRef method adds the start offset of the stream to
Xref entry's offset: "stream.pos = startXRef + stream.start".
Clearly, this line assumes that the entry offset excludes the start
offset.
However, when the PDF is parsed in recovery mode, the xref table is
filled with entries whose offset is relative to the start of the stream
rather than the PDF file. This is incorrect, and the fix is to subtract
the start offset of the stream from the entry's byte offset.
The manually created PDF file serves as a regression test. It is a valid
PDF, except:
- The integer to point to the start of the xref table and the %%EOF
trailer are missing. This will activate recovery mode in PDF.js
- Some junk was added before the start of the PDF file. This exposes the
bad offset bug.
Doing this helped uncover an issue with the `getDestination` implementation.
Currently if a named destination doesn't exist, the method (in `obj.js`) may return `undefined` which leads to the promise being stuck in a pending state.
*Note:* returning `null` for this case is consistent with other methods, e.g. `getOutline` and `getAttachments`.
The patch adds a test to ensure that `getStats` returns the expected result after the page has been parsed, and cleans up the existing test a bit.
Also, since I'm touching the file anyway, I'm making a small adjustment of the `getDownloadInfo` test. (I have no idea why I didn't just write it like this initially.)
This patch adds:
- Unit tests for the annotation border style class
- Regression test (self-made) for the annotation border style class
- Documentation generation using JSDoc
Currently in the tests which check that incorrect passwords are rejected, we don't ensure that the exceptions thrown are the ones we expect. This patch improves the current situation, so that we actually can be sure that the code "fails" in the correct way.
*Note:* This patch also fixes some cases of weird indentation in the file.
Fixes 6068.
The most notable issue with the font in question is that the `differences` array contains lots of strange entries (of the type `uniXXXX`, instead of proper glyph names).
The 'Version' field of the most recent document catalog, if present, is
intended to supersede the value in the file prologue.
This is significant for incrementally-built PDF documents and generators that
emit a low version in the prologue and later apply a format version based on
PDF features used, such as Apple's CoreGraphics/Quartz PDF backend.
Fixes the internal version variable, as well as the PDFFormatVersion reported
by the API and consumed by viewers.
This adds a checkbox with which one can disable scrolling, for example to look back at output during testing. Note that the styles are inline because the test runner removes all <style> elements for each test.
- Improve variable names and move variables to the top of the method
- Use constants where possible
- Only print the delay text if there is an atual delay set
- Simplify continuation logic in _nextTask
- Do not pass anything to clearCanvas: we can get the context from this.canvas there.
- Remove innerHTML and replace it with textContent. Add a comment why insertAdjancentHTML is so important for performance and runtime.
- Merge _quit and _sendQuitRequest.
- Extract NullTextLayerBuilder and SimpleTextLayerBuilder from the driver code and create classes for them.
- Move options to the constructor and pass in HTML elements instead of getting them in the driver.
- Remove unused masterMode variable.
- Minor method name updates.
The rest is only moving/indenting existing code and adding 'this'.
- Remove a hack for Chrome on Windows because we do not run Chrome on the Windows bot anymore, so it added unneeded complexity.
- Merge nextTask and continueNextTask functions. nextTask did nothing more than calling cleanup. This change also allows us to remove the callback for that function.
- Remove unnecessary one-line functions.
For passwords where the encoding already is correct, the conversion is a no-op.
Also, since `encodeURIComponent` might throw, we need to make sure that we handle that case too.
Fixes 6010.
Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1050040.
With this patch the file is completely readable, but given that the font is broken enough to be rejected by OTS the rendering differs slightly from Adobe Reader.
*Note:* the PDF file is sufficiently broken that even Adobe Reader complains about the font, *and* also about another more general issue.
Prior to PR 4259, we *incorrectly* ignored `toUnicode` for Type3 fonts. Since we now handle that correctly, this patch adds a `text` test-case to prevent regressions.
According to practical experiments, falling back to "Helvetica" when we encounter a non-embedded "[Century Gothic](http://en.wikipedia.org/wiki/Century_Gothic)" `CIDFontType2` font seems to work well.
(Also, the section on Wikipedia about "Printer ink usage" *might* provide some anecdotal evidence that Century Gothic is a fairly standard sans-serif font.)
Obviously this patch doesn't make "Century Gothic" fonts render perfectly, as is often the case with non-embedded fonts, but all the text is now legible in the referenced issues.
Fixes 4722.
Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=879561.
When submitting PR 5276 there wasn't a good PDF file to include in the test suite. However, with https://bugzilla.mozilla.org/show_bug.cgi?id=1108753, we now have a better source for a test file, hence this patch.
As a follow-up to PR #4795, this patch improves the `getData` unit test to ensure that the returned data is of the expected type and of the correct length.
The files issue3115.pdf and issue2337.pdf are identical, the only difference being that the first one is an `eq` test and the second one a `load` test. Hence there is no reason to keep the `load` test, since it's just a subset of the `eq` test.
When the binary CMaps were added, some of the relevant unit tests were not changed. This patch updates them, so that we actually test the current implementation.
What's somewhat troubling here is that we currently have CMap unit tests that passes, *despite* not working as intended (the CMap files doesn't load).
The `console.log` statement in evaluator_spec.js is obviously not needed. In obj.js it could have been replaced by `info`, but that seemed unnecessary given the already existing `error`.
readCharCode() returns two values, and currently allocates a length-2
array on every call to do so. This change makes it instead us a
passed-in object which can be reused.
This tiny change reduces the total JS allocations done for the document
in Mozilla bug 992125 by 4.2%.
cid chars are 16-bit unsigned integers. Currently we convert them to
single-char strings when inserting them into the CMap, and then convert
them back to integers when extracting them from the CMap. This patch
changes CMap so that cid chars stay in integer format throughout, saving
both time and space.
When loading the PDF from issue #4580, this change reduces peak RSS from
~600 to ~370 MiB. It also improves overall speed on that PDF by ~26%,
going from 724 ms to 533 ms.
PR 4259 fixed a large number of font bugs, but none of those where added as test-cases. This was, in my opinion, unfortunate since it increases the risk of regressions in the future when other font bugs are fixed.
This PR simply adds a few more test-cases, to improve our test coverage somewhat.
It's currently possible to step through the test results using the <kbd>N</kbd> and <kbd>P</kbd> keys, and you can also switch between test and reference images with the <kbd>T</kbd> key.
However if you want to highlight the differences, that can only be done by clicking. This has always annoyed me somewhat, so this patch adds support for toggling the differences view with the <kbd>D</kbd> key.
*Added AES128 Encryption
*Added AES258 Encryption/Decryption
*Added SHA256
*Added SHA512
*Added class to handle 8 byte integers and associated bit operations
*Added SHA384
*Added routines to handle new algorithm and perform PDF2.0 hashing.
"text/javascript" is not a correct MIME type (the correct one is
"application/javascript") but it's not even needed; all browsers default
to the correct type and treat it as executable JS when type is ommited.
Since not all browsers recognize the "application/javascript" MIME type
the only way to both stay compliant and to support all popular browsers
is to omit the type. It's also shorter this way.