Jasmine recommends to use the `configure` method on the environment
during boot. This commit makes the code correspond to how it's done in
Jasmine's default boot file. The options dropdown in the HTML reporter
now works again after these changes, because this broke in the upgrade
to Jasmine 3, and the unit tests are executed in a random order by
default, which is important to make sure the unit tests are
self-contained and don't depend on the result of another unit test.
This line in the annotation tests subtracts an array from a number. This is because `splice(-1, 1)` returns a one-element array, while `pop()` returns only the element itself.
*Please note:* I'm totally fine with this patch being rejected, and the issue closed as WONTFIX; however these changes should address the issue if that's desired.
From a conceptual point of view, reporting loading progress doesn't really make a lot of sense for PDF files opened by passing raw binary data directly to `getDocument` (since obviously *all* data was loaded).
This is compared to PDF files loaded via e.g. `XMLHttpRequest` or the Fetch API, where the entire PDF file isn't available from the start and knowing the loading progress makes total sense.
However I can certainly see why the current API could be considered inconsistent, which isn't great, since a registered `onProgress` callback will never be called for certain `getDocument` calls.
The simplest solution to this inconsistency thus seem to be to ensure that `onProgress` is always called when handling the `DataLoaded` message, since that will *always* be dispatched[1] from the worker-thread.
---
[1] Note that this isn't guaranteed to happen, since setting `disableAutoFetch = true` often prevents the *entire* file from ever loading. However, this isn't relevant for the issue at hand, and is a well-known consequence of using `disableAutoFetch = true`; note how the default viewer even has a specialized code-path for hiding the loadingBar.
*This patch is based on something that I noticed while working on PR 10126.*
The recent re-factoring of `PDFFindController` brought many improvements, among those the fact that access to `BaseViewer` is no longer required. However, with these changes there's one thing which now strikes me as not particularly user-friendly[1]: The fact that in order for searching to actually work, `PDFFindController.setDocument` must be called *and* a 'pagesinit' event must be dispatched (from somewhere).
For all other viewer components, calling the `setDocument` method[2] is enough in order for the component to actually be usable.
The `PDFFindController` thus stands out quite a bit, and it also becomes difficult to work with in any sort of custom implementation. For example: Imagine someone trying to use `PDFFindController` separately from the viewer[3], which *should* now be relatively simple given the re-factoring, and thus having to (somehow) figure out that they'll also need to manually dispatch a 'pagesinit' event for searching to work.
Note that the above even affects the unit-tests, where an out-of-place 'pagesinit' event is being used.
To attempt to address these problems, I'm thus suggesting that *only* `setDocument` should be used to indicate that searching may start. For the default viewer and/or the viewer components, `BaseViewer.setDocument` will now call `PDFFindController.setDocument` when the document is ready, thus requiring no outside configuration anymore[4]. For custom implementation, and the unit-tests, it's now as simple as just calling `PDFFindController.setDocument` to allow searching to start.
---
[1] I should have caught this during review of PR 10099, but unfortunately it's sometimes not until you actually work with the code in question that things like these become clear.
[2] Assuming, obviously, that the viewer component in question actually implements such a method :-)
[3] There's even a very recent issue, filed by someone trying to do just that.
[4] Short of providing a `PDFFindController` instance when creating a `BaseViewer` instance, of course.
This commit shows that we can now unit test the find controller and
that executing regular queries works. Note that this is only a first
step and not a complete suite of unit tests for all possible options
of the find controller.
While writing this unit test, I found two smaller issues that I
addressed directly. The first one is that in the previous find
controller refactoring I forgot to rename some occurrences of a now
private member variable. Fortunately this did not cause any bugs since
we did have a public getter and the fetched value may be changed by
reference, but it's nevertheless good to fix. The second issue is that
some entries in the `test/unit/clitests.json` file were not correct,
resulting in these tests not being executed on e.g., Travis CI.
As outlined in https://bugzilla.mozilla.org/show_bug.cgi?id=1282759 the internal Firefox name for the feature is `entireWord`, hence that name is used here as well for consistency (with "Whole words" being limited to the UI).
Given existing limitations of the PDF.js search functionality, e.g. the existing problems of searching across "new lines", there's some edge-cases where "Whole words" searching will ignore (valid) results.
However, considering that this is a pre-existing issue related to the way that the find controller joins text-content together, that shouldn't have to block this new feature in my opionion.
*Please note:* In order to enable this feature in the `MOZCENTRAL` version, a small follow-up patch for [PdfjsChromeUtils.jsm](https://hg.mozilla.org/mozilla-central/file/tip/browser/extensions/pdfjs/content/PdfjsChromeUtils.jsm) will be required once this has landed in `mozilla-central`.
There have been lots of problems with trying to map glyphs to their unicode
values. It's more reliable to just use the private use areas so the browser's
font renderer doesn't mess with the glyphs.
Using the private use area for all glyphs did highlight other issues that this
patch also had to fix:
* small private use area - Previously, only the BMP private use area was used
which can't map many glyphs. Now, the (much bigger) PUP 16 area can also be
used.
* glyph zero not shown - Browsers will not use the glyph from a font if it is
glyph id = 0. This issue was less prevalent when we mapped to unicode values
since the fallback font would be used. However, when using the private use
area, the glyph would not be drawn at all. This is illustrated in one of the
current test cases (issue #8234) where there's an "ä" glyph at position
zero. The PDF looked like it rendered correctly, but it was actually not
using the glyph from the font. To properly show the first glyph it is always
duplicated and appended to the glyphs and the maps are adjusted.
* supplementary characters - The private use area PUP 16 is 4 bytes, so
String.fromCodePoint must be used where we previously used
String.fromCharCode. This is actually an issue that should have been fixed
regardless of this patch.
* charset - Freetype fails to load fonts when the charset size doesn't match
number of glyphs in the font. We now write out a fake charset with the
correct length. This also brought up the issue that glyphs with seac/endchar
should only ever write a standard charset, but we now write a custom one.
To get around this the seac analysis is permanently enabled so those glyphs
are instead always drawn as two glyphs.
This patch is the first step to be able to eventually get rid of the `attachDOMEventsToEventBus` function, by allowing `EventBus` instances to simply re-dispatch most[1] events to the DOM.
Note that the re-dispatching is purposely implemented to occur *after* all registered `EventBus` listeners have been serviced, to prevent the ordering issues that necessitated the duplicated page/scale-change events.
The DOM events are currently necessary for the `mozilla-central` tests, see https://hg.mozilla.org/mozilla-central/file/tip/browser/extensions/pdfjs/test, and perhaps also for custom deployments of the PDF.js default viewer.
Once this have landed, and been successfully uplifted to `mozilla-central`, I intent to submit a patch to update the test-code to utilize the new preference. This will thus, eventually, make it possible to remove the `attachDOMEventsToEventBus` functionality.
*Please note:* I've successfully ran all `mozilla-central` tests locally, with these patches applied.
---
[1] The exception being events that originated on the `window` or `document`, since those are already globally available anyway.
Apparently there's some PDF generators, in this case the culprit is "Nooog Pdf Library / Nooog PStoPDF v1.5", that manage to mess up PDF creation enough that endstream[1] commands actually become truncated.
*Please note:* The solution implemented here isn't perfect, since it won't be able to cope with PDF files that contains a *mixture* of correct and truncated endstream commands.
However, considering that this particular mode of corruption *fortunately* doesn't seem very common[2], a slightly less complex solution ought to suffice for now.
Fixes 10004.
---
[1] Scanning through the PDF data to find endstream commands becomes necessary, in order to determine the stream length in cases where the `Length` entry of the (stream) dictionary is missing/incorrect.
[2] I cannot recall having seen any (previous) issues/bugs with "Missing endstream" errors.
Please note that while this *improves* issue 9984 slightly (and likely others too), it's not a complete solution.
The remaining issues are related to the, more general, problems with the existing heuristics related to attempting to combine separate text items.
This commit is the first step towards implementing parsing for the
appearance streams of annotations.
Co-authored-by: Jonas Jenwald <jonas.jenwald@gmail.com>
Co-authored-by: Tim van der Meij <timvandermeij@gmail.com>
The font tests use Jasmine too, so while they are technically unit
tests, it's a bit confusing to see `Started unit tests` when the font
tests are run on the bots.
The font in the PDF is marked as a CIDFontType0, but the font file is
actually a true type font. To fully address this issue we should really
peek into the font file and try to determine what it is. However, this
is the first case of this issue, so I think this solution is acceptable for
now.
This should really have been included in PR 9868, since it will help ensure that the `URL` constructor is correctly imported/exported by `src/shared/util.js`.
According to the PDF specification, see https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#page=45
> When using the JPXDecode filter with image XObjects, the following changes to and constraints on some entries in the image dictionary shall apply (see 8.9.5, "Image Dictionaries" for details on these entries):
>
> - Width and Height shall match the corresponding width and height values in the JPEG2000 data.
>
> - . . .
Hence it seems reasonable to use the Width/Height of the image data *itself*, rather than the image dictionary when there's a mismatch. Given that JPEG 2000 images are already being parsed, in order to obtain basic parameters, the actual Width/Height is readily available in the `PDFImage` constructor.