Commit Graph

15108 Commits

Author SHA1 Message Date
Brendan Dahl
3a31b7ef0c
Merge pull request #14258 from Snuffleupagus/issue-14256-2
Always prefer abbreviated keys, over full ones, when doing any dictionary lookups (issue 14256)
2021-11-11 14:23:43 -08:00
Jonas Jenwald
8eed0b9145 Report "pageInfo" telemetry once, rather than for each rendered page
Reporting telemetry, in Firefox, includes using `JSON.stringify` on the data and then sending an event to the `PdfStreamConverter.jsm`-code.
In that code the event is handled and `JSON.parse` is used to retrieve the data, and in the "pageInfo"-case we'll then proceed to ignore everything except *the first* such event; see https://searchfox.org/mozilla-central/rev/24fac1ad31fb9c6e9c4c767c6a7ff45d226078f3/toolkit/components/pdfjs/content/PdfStreamConverter.jsm#509-514

All-in-all, sending the "pageInfo" telemetry for each rendered page is thus unnecessary and this patch makes the viewer send it only *once* instead.
2021-11-11 12:36:06 +01:00
Jonas Jenwald
ea1c348c67 Always prefer abbreviated keys, over full ones, when doing any dictionary lookups (issue 14256)
Note that issue 14256 was specifically about *inline* images, please refer to:
 - https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf#G7.1852045
 - https://www.pdfa.org/safedocs-unearths-pdf-inline-image-issue/
 - https://pdf-issues.pdfa.org/32000-2-2020/clause08.html#H8.9.7

However, during review of the initial PR in https://github.com/mozilla/pdf.js/pull/14257#issuecomment-964469710, it was suggested that we instead do this *unconditionally for all* dictionary lookups.
In addition to re-ordering the existing call-sites in the `src/core`-code, and adding non-PRODUCTION/TESTING asserts to catch future errors, for consistency a number of existing `if`/`switch`-blocks were re-factored to also check the abbreviated keys first.
2021-11-10 11:56:18 +01:00
Brendan Dahl
4ee906adf4
Merge pull request #14209 from Snuffleupagus/issue-14205
[Google Chrome] Ensure that `markedContent` spans are placed in the top-left corner (issue 14205)
2021-11-09 07:59:14 -08:00
calixteman
4bb9de4b00
Merge pull request #14239 from calixteman/1739502
XFA - Fix a breakBefore issue when target is a contentArea and startNew is 1 (bug 1739502)
2021-11-08 03:14:42 -08:00
Jonas Jenwald
27e461a897 [Chromium addon] Add the Page scrolling mode to the options (PR 14112 follow-up) 2021-11-08 10:18:25 +01:00
Jonas Jenwald
8064318015
Merge pull request #14250 from calixteman/14249
XFA - Encode tag names in UTF-8 when saving (fix #14249)
2021-11-07 22:28:53 +01:00
Calixte Denizet
13ae6d493a XFA - Encode tag names in UTF-8 when saving (fix #14249) 2021-11-07 21:41:37 +01:00
Tim van der Meij
891f21fba6
Merge pull request #14245 from Snuffleupagus/PDFPageViewBuffer-class
Convert `PDFPageViewBuffer` to a standard class, and use a `Set` internally
2021-11-07 14:37:33 +01:00
Jonas Jenwald
13ef763222 [Google Chrome] Ensure that markedContent spans are placed in the top-left corner (issue 14205)
*This is a tentative patch, since we unfortunately cannot easily test it (as far as I can tell).*

In Firefox this (obviously) works as-is, but in Google Chrome the `markedContent` spans are inserted within the regular text-content (in the DOM) and with non-zero heights.
2021-11-07 11:01:35 +01:00
calixteman
efb4455749
Merge pull request #14240 from calixteman/14014
XFA - Get each page asynchronously in order to avoid blocking the event loop (#14014)
2021-11-06 13:21:43 -07:00
Tim van der Meij
b820beb8d9
Merge pull request #14244 from Snuffleupagus/issue-14243
Prevent mobile devices from interfering with the textLayer-elements (issue 14243)
2021-11-06 19:30:47 +01:00
Calixte Denizet
1681e25008 XFA - Get each page asynchronously in order to avoid blocking the event loop (#14014) 2021-11-06 13:25:03 +01:00
Jonas Jenwald
12d41bcba4 Prevent mobile devices from interfering with the textLayer-elements (issue 14243)
*This is a tentative patch, since I don't have the necessary hardware to test it.*

See https://developer.mozilla.org/en-US/docs/Web/CSS/text-size-adjust, which is currently ignored in Firefox.
It seems overall safer, and more future-proof, to simply add this to the *entire* `textLayer` rather than its individual elements.
2021-11-06 11:39:43 +01:00
Jonas Jenwald
a774707e31 Remove the moveToEndOfArray helper function, since it's unused
With the previous patch, this helper function is no longer used and keeping it around will simply increase the size of the builds.
This removal is purposely done *separately*, to make it easy to revert the patch in the future if this helper function would become useful again.
2021-11-06 10:19:17 +01:00
Jonas Jenwald
f55bf42398 Convert PDFPageViewBuffer to use a Set internally
This relies on the fact that `Set`s preserve the insertion order[1], which means that we can utilize an iterator to access the *first* stored view.

Note that in the `resize`-method, we can now move the visible pages to the back of the buffer using a single loop (hence we don't need to use the `moveToEndOfArray` helper function any more).

---
[1] This applies to `Map`s as well, although that's not entirely relevant here.
2021-11-06 10:19:17 +01:00
Jonas Jenwald
0eba15b43a Convert PDFPageViewBuffer to a standard class
This patch makes use of private `class` fields, to ensure that the previously "private" properties remain as such.
2021-11-06 10:19:17 +01:00
Jonas Jenwald
c62bcb55ac Adjust the "handles resize correctly, with idsToKeep provided" unit-test (PR 14238 follow-up)
This small change will help validate an important part of the upcoming re-factoring, regarding the *correct* iteration of the `Set` in the `PDFPageViewBuffer.resize` method in particular.
2021-11-06 10:19:17 +01:00
Jonas Jenwald
38efd13a54
Merge pull request #14230 from brendandahl/reduce-gradient-size
Create shading patterns the size of the current path. (bug 1722807)
2021-11-06 10:17:49 +01:00
Brendan Dahl
b56cca0324 Create shading patterns the size of the current path. (bug 1722807)
Previously, when we created a shading pattern canvas we created it
as the same size as the page. This was good for caching if the same
pattern was used over and over again, but when lots of different
shadings are created that caused us to create many full page
canvases.

Instead of creating the full page canvses, create the canvas
as the same size as the current path bounding box. This reduces memory
consumption by a lot since most paths are pretty small. Also, in real world
PDFs it's rare for a shading (non shading fill) to be reused over and over again.
Bug 1721949 is an example where the same pattern is reused and it will be slightly
slower than before.
2021-11-05 20:44:18 -07:00
Brendan Dahl
3b5a463357
Merge pull request #14241 from brendandahl/group-bbox-matrix
Don't double apply a group xobject's bbox.
2021-11-05 20:39:02 -07:00
Brendan Dahl
8161d3f29d Don't double apply a group xobject's bbox.
In `beginGroup` we create a new canvas that is the size of the
bounding box and we translate it to the offset. This means we don't need to
also apply the bounding box during `paintFormXObjectBegin`.

This improves #6961 quite a bit, but it still is missing the indention
in the ruler.
2021-11-05 15:40:58 -07:00
Tim van der Meij
30bd5f0a39
Merge pull request #14238 from Snuffleupagus/PDFPageViewBuffer-tests
Add a couple of basic unit-tests for `PDFPageViewBuffer`
2021-11-05 19:58:15 +01:00
Jonas Jenwald
fe205efd8d Add a couple of basic unit-tests for PDFPageViewBuffer
The `PDFPageViewBuffer`-code is very important for the correct function of the viewer, but it's currently not tested at all.
While the `PDFPageViewBuffer` is obviously intended to be used with `PDFPageView`-instances, it only accesses a couple of `PDFPageView` properties/methods and consequently it's fairly easy to unit-test this code with dummy-data.

These unit-tests should help improve our confidence in this code, and will also come in handy with other changes that I'm working on (regarding modernizing and re-factoring the `PDFPageViewBuffer`-code).
2021-11-05 19:43:20 +01:00
Calixte Denizet
a08763f4aa XFA - Fix a breakBefore issue when target is a contentArea and startNew is 1 (bug 1739502)
- it aims to fix https://bugzilla.mozilla.org/show_bug.cgi?id=1739502;
 - when the target area was the current content area, everything was pushed in it instead of creating a new one (and consequently a new pageArea is created).
 - the pdf shows an alignment issue on page 4:
   - the hAlign is "center" but the subform was the width of its parent, so compute the real width of the subform with tb layout;
 - there is an extra empty page at the end of the pdf:
   - there is a subform with some hidden elements which are not rendered for now (since there is no plugged JS engine it isn't possible to draw them in changing their visibility).
   - so in case a subform is empty and has no real dimensions (at least one is 0), we just consider it as empty.
2021-11-05 18:59:55 +01:00
calixteman
e136afbabc
Merge pull request #14218 from janekotovich/subform_min_0
XFA subform with occur min=0 and no bound data displaying.
2021-11-05 04:12:34 -07:00
Jonas Jenwald
8222d6530b
Merge pull request #14232 from brendandahl/show-text-pattern
Use correct matrix for patterns with showText.
2021-11-05 10:04:56 +01:00
Brendan Dahl
1c7048399b Use correct matrix for patterns with showText.
We were incorrectly using the transform in the pattern before it had been
adjusted causing the pattern to be misplaced relative to the page.

Fixes: ShowText-ShadingPattern.pdf (already in corpus)
Fixes: #8111
Fixes: #9243
2021-11-04 16:57:36 -07:00
Jane-Kotovich
56b502391c XFA subform with occur min=0 and no bound data displaying
Subfrom nomin displays even though it's subform is set to <occur max=-1 min=0>
If we look through specs of XFA 3.3 : https://www.pdfa.org/norm-refs/XFA-3_3.pdf
- The min attribute is used when processing a form that contains data. Regardless of the data at least this number of instances is included. It is permissible to set this value to zero, in which case the container is entirely excluded if there is no data for it.

However, in our case it doesn't happen, because we let our empty dataNode get through. Though by setting a clause:
- eliminate unmatched data with occur min=0
we are checking our empty data and sending it to uselessNode array where at the end it gets removed;
2021-11-04 20:22:05 +10:00
Jonas Jenwald
611627f5a1
Merge pull request #14219 from Snuffleupagus/getVisibleElements-ids
Let `getVisibleElements` return a Set containing the visible element `id`s
2021-11-03 23:49:27 +01:00
Jonas Jenwald
e1a35e7bb6
Merge pull request #14213 from Snuffleupagus/issue-11656
Tweak the Bidi-detection heuristics for very short RTL strings (issue 11656)
2021-11-03 22:09:14 +01:00
Jonas Jenwald
c2f335186a
Merge pull request #14228 from brendandahl/consume-path
Reset path bounding box tracking when starting a new path.
2021-11-03 21:56:46 +01:00
Jonas Jenwald
e78e4e72bf Further modernize PDFThumbnailViewer.scrollThumbnailIntoView
The way that we're currently handling the last-`id` is very old, and there's no longer any good reason to special-case things when only one thumbnail is visible.
Furthermore, we can also modernize the loop slightly by using `for...of` instead of `Array.prototype.some()` when checking for fully visible thumbnails.
2021-11-03 21:13:47 +01:00
Jonas Jenwald
6323f8532a Let getVisibleElements return a Set containing the visible element ids
Note how in `PDFPageViewBuffer.resize` we're manually iterating through the visible pages in order to build a Set of the visible page `id`s. By instead moving the building of this Set into the `getVisibleElements` helper function, as part of the existing parsing, this code becomes *ever so slightly* more efficient.

Furthermore, more direct access to the visible page `id`s also come in handy in other parts of the viewer as well.
In the `BaseViewer.isPageVisible` method we no longer need to loop through the visible pages, but can instead directly check if the pageNumber is visible.
In the `PDFRenderingQueue.getHighestPriority` method, when checking for "holes" in the page layout, we can also avoid some unnecessary look-ups this way.
2021-11-03 21:13:44 +01:00
Jonas Jenwald
5f77d3719b Tweak the Bidi-detection heuristics for very short RTL strings (issue 11656)
Very short strings can narrowly miss the existing Bidi-detection threshold, leading to incorrect text-selection and copying behaviour.

In my testing, neither Adobe Reader or PDFium seem to handle copying "correctly" for this document. Hence it's not entirely clear to me that we actually want to fix this, since tweaking these heuristics can *obviously* cause regressions elsewhere (and our test coverage for RTL-text isn't exactly great).
2021-11-03 20:31:57 +01:00
Tim van der Meij
2ac6c939a5
Merge pull request #14225 from Snuffleupagus/render-better-holes-check
Avoid doing unnecessary checks, when pre-rendering page layouts with "holes" (PR 14131 follow-up)
2021-11-03 19:48:37 +01:00
Brendan Dahl
039a7a670f Reset path bounding box tracking when starting a new path.
Starting a new path will wipe out any of the current subpaths in the
current graphics state, so we should reset the min/maxes.

This makes a number of the bounding boxes smaller and reduces the number
of composed pixels. For the smask tests in the corpus, the number of
composed pixesl goes from 19,872,109 to 19,676,905. The difference is much
larger on other PDFs though.
2021-11-03 11:46:52 -07:00
Tim van der Meij
c68dc03be6
Merge pull request #14221 from Snuffleupagus/pr-12870-followup
Use `BaseViewer.previousPage` more in the default viewer (PR 12870 follow-up)
2021-11-03 19:44:12 +01:00
Jonas Jenwald
ab5f4a3e5e Avoid doing unnecessary checks, when pre-rendering page layouts with "holes" (PR 14131 follow-up)
*Sometimes I'll hopefully learn to optimize my code directly when writing it, rather than having to do multiple clean-up passes; sorry about the churn here!*

For most page layouts there won't be any "holes" in the visible pages (or thumbnails), and in those cases it'd obviously be preferable not having to repeat any checks of already rendered pages.
Rather than only checking the "distance" between the first/last pages, we can instead compare the theoretical number of pages (between first/last) with the actually visible number of pages instead. This way, we're able to better detect the "holes"-case and can skip unnecessary parsing in the common case.
2021-11-03 09:40:39 +01:00
Brendan Dahl
60ab751bb6
Merge pull request #14217 from Snuffleupagus/bug-1732141
[Firefox] Handle errors if loading failed before the "supportsRangedLoading" message was sent (bug 1732141)
2021-11-02 13:32:17 -07:00
Jonas Jenwald
292a715c1c Use optional chaining to simplify PDFViewerApplication.store accesses in various event handlers
This way we no longer need the intermediate variables.
2021-11-02 12:00:57 +01:00
Jonas Jenwald
d6e8b8fbc1 Use BaseViewer.previousPage more in the default viewer (PR 12870 follow-up)
I missed this one spot in PR 12870, when converting the other cases in the "keydown" event handler. However, given that it only matters in PresentationMode and/or when "page-fit" zooming is enabled, this oversight shouldn't have had any user-observable impact (but we should fix it nonetheless).
2021-11-02 11:48:18 +01:00
Jonas Jenwald
f4e88b0a57 [Firefox] Handle errors if loading failed before the "supportsRangedLoading" message was sent (bug 1732141)
This is a follow-up to PR 10675, since there I completely overlooked that we also need to handle the case where a PDF document has *failed* to load when the "supportsRangedLoading" message is sent to the viewer.
2021-11-01 17:50:49 +01:00
Tim van der Meij
6a15973a1b
Merge pull request #14212 from Snuffleupagus/issue-10301-test
Add a RTL-text reference test (issue 10301)
2021-10-31 17:57:50 +01:00
Jonas Jenwald
8edec018fe Add a RTL-text reference test (issue 10301)
It seems that issue 10301 was fixed by PR 13424, by combining the spans, however given that we don't have a lot of test coverage for RTL-text I figured that adding a simple reference test wouldn't hurt (rather than just closing the issue as WORKSFORME).
2021-10-31 16:55:11 +01:00
Jonas Jenwald
8c70258065
Merge pull request #14182 from calixteman/richtext
Support rich content in markup annotation
2021-10-31 14:41:56 +01:00
Calixte Denizet
cf8dc750d6 Support rich content in markup annotation
- use the xfa parser but in the xhtml namespace.
2021-10-31 13:44:51 +01:00
Tim van der Meij
fe11711325
Merge pull request #14210 from Snuffleupagus/update-packages
Update packages and translations
2021-10-31 13:13:25 +01:00
Jonas Jenwald
822856e044 Update l10n files 2021-10-31 09:56:41 +01:00
Jonas Jenwald
48896adc27 Enable the ESLint no-unused-private-class-members rule
Given that we're now using private `class` fields/methods, enabling this new ESLint rule should help avoid adding dead code.

Please find additional details about the ESLint rule at https://eslint.org/docs/rules/no-unused-private-class-members
2021-10-31 09:52:07 +01:00