Moreover, group the lexer unit tests per method. This matches what we
do for other classes and makes it more easily visible which methods
we don't or insufficiently unit test.
The parser itself is not unit tested yet, so this patch provides a start
for doing so. The `inlineStreamSkipEI` method is used in other end
marker detection methods, so it's important that its functionality is
correct for proper parsing.
Note how `PDFDocumentProxy.destroy` is a nothing more than an alias for `PDFDocumentLoadingTask.destroy`. While removing the latter method would be a breaking API change, there's still room for at least some clean-up here.
The main changes in this patch are:
- Stop providing a `PDFDocumentLoadingTask` instance *separately* when creating a `PDFDocumentProxy`, since the loadingTask is already available through the `WorkerTransport` instance.
- Stop tracking the `PDFDocumentProxy` instance on the `WorkerTransport`, since that property is completely unused.
- Simplify the 'Multiple `getDocument` instances' unit-tests by only destroying *once*, rather than twice, for each document.
For Type3 fonts text-selection is often not that great, and there's a couple of heuristics used to try and improve things. This patch simple extends those heuristics a bit, and fixes a pre-existing "naive" array comparison, but this all feels a bit brittle to say the least.
The existing Type3 test-coverage isn't that great in general, and in particular Type3 `text` tests are few and far between, hence why this patch adds *two* different new `text` tests.
The `DOMCanvasFactory` class is now fully covered. Moreover, missing
cases for the `getFilenameFromUrl` function have been included.
Finally, `var` usage has been removed.
The `src/shared/util.js` file is being bundled into both the `pdf.js` and `pdf.worker.js` files, meaning that its code is by definition duplicated.
Some main-thread only utility functions have already been moved to a separate `src/display/display_utils.js` file, and this patch simply extends that concept to utility functions which are used *only* on the worker-thread.
Note in particular the `getInheritableProperty` function, which expects a `Dict` as input and thus *cannot* possibly ever be used on the main-thread.
This file (currently) contains not only DOM-specific helper functions/classes, but is used generally for various helper code relevant for main-thread functionality.
*Hopefully this patch makes sense, since I cannot claim to fully understand this function.*
With the changes made in PR 3354 *some* Type3 glyph outlines are no longer rendering correctly, since the final paths were being accidentally ignored.
The fact that Type3 fonts are not very common in PDF documents, and that most Type3 glyphs are unaffected by this regression, probably explains why this has gone unnoticed since 2013.
There doesn't appear to be any particular reason for only running these unit-tests in browsers, since the `PDFDataRangeTransport` functionality itself should be back-end agnostic.
This allows simplification of the 'creates pdf doc from URL and aborts loading after worker initialized' API unit-test.
Note that the `DOMFileReaderFactory` uses the Fetch API, for simplicity, since it should be available in all browsers where we're running tests.
pdf.js had a problem when copying characters on supplementary planes
(0xPPXXXX where PP is nonzero). This is because certain methods of
PartialEvaluator use classic String.fromCharCode instead of ES6's
String.fromCodePoint.
Despite the fact that readToUnicode method *tried* to parse out-of-UCS2
code points by parsing UTF-16BE, it was inadequate because
String.fromCharCode only supports UCS-2 range of Unicode.
All objects in the PDF document follow this pattern:
```
0000000001 0 obj
<<
% Some content here...
>>
endobj
0000000002 0 obj
<<
% More content here...
endobj
```
In many cases in the code you don't actually care about the index itself, but rather just want to know if something exists in a String/Array or if a String starts in a particular way. With modern JavaScript functionality, it's thus possible to remove a number of existing `indexOf` cases.
This will allow the Metadata to be successfully extracted from the PDF file in issue 10395.
Furthermore, this patch also fixes a bug in `Metadata.get` which causes the method to return `null` rather than an empty string or zero (since either ought to be allowed).
Previously a couple of different attempts at fixing this problem has been rejected, given how *crucial* this code is for the correct function of the viewer, since no one has thus far provided any evidence that the problem actually affects the default viewer[1] nor an example using the viewer components directly (without another library on top).
The fact that none of the prior patches contained even a *simple* unit-test probably contributed to the unwillingness of a reviewer to sign off on the suggested changes.
However, it turns out that it's possible to create a reduced test-case, using the default viewer, that demonstrates the error[2]. Since this utilizes a hidden `<iframe>`, please note that this error will thus affect Firefox as well.
Note that while errors are thrown when the hidden `<iframe>` loads, the default viewer doesn't break completely since rendering does start working once the `<iframe>` becomes visible (although the errors do break the initial Toolbar state).
Before making any changes here, I carefully read through not just the immediately relevant code but also the rendering code in the viewer (given it's dependence on `getVisibleElements`). After concluding that the changes should be safe in general, the default viewer was tested without any issues found. (The above being much easier with significant prior experience of working with the viewer code.)
Finally the patch also adds new unit-tests, one of which explicitly triggers the relevant code-path and will thus fail with the current `master` branch.
This patch also makes `PDFViewerApplication` slightly more robust against errors during document opening, to ensure that viewer/document initialization always completes as expected.
Please keep in mind that even though this patch prevents an error in `getVisibleElements`, it's still not possible to set the initial position/zoom level/sidebar view etc. when the viewer is hidden since rendering and scrolling is completely dependent[3] on being able to actually access the DOM elements.
---
[1] And hence the PDF Viewer that's built-in to Firefox.
[2] Copy the HTML code below and save it as `iframe.html`, and place the file in the `web/` folder. Then start the server, with `gulp server`, and navigate to http://localhost:8888/web/iframe.html
```html
<!DOCTYPE html>
<html>
<head>
<title>Iframe test</title>
<script>
window.onload = function() {
const button = document.getElementById('button1');
const frame = document.getElementById('frame1');
button.addEventListener('click', function(evt) {
frame.hidden = !frame.hidden;
});
};
</script>
</head>
<body>
<button id="button1">Toggle iframe</button>
<br>
<iframe id="frame1" width="800" height="600" src="http://localhost:8888/web/viewer.html" hidden="true"></iframe>
</body>
</html>
```
[3] This is an old, pre-exisiting, issue that's not relevant to this patch as such (and it's already being tracked elsewhere).
The error was triggered for a particular set of metadata, where an end tag was encountered without the corresponding begin tag being present in the data.
(The patch also fixes a minor oversight, from a recent PR, in the `SimpleDOMNode.nextSibling` method.)