The PDF.js API has only ever supported accessing the original file ID, however the second one that (should) exist in *modified* documents have thus far been completely inaccessible through the API.
That seems like a simple oversight, caused e.g. by the viewer not needing it, since it really shouldn't hurt to provide API-users with the ability to check if a PDF document has been modified since its creation.[1]
Please refer to https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf#G13.2261661 for additional information.
For an example of how to update existing code to use the new API, please see the changes in the `web/app.js` file included in this patch.
*Please note:* While I'm not sure if we'll ever be able to remove the old `PDFDocumentProxy.fingerprint` getter, given that it's existed since "forever", that probably isn't a big deal given that it's now limited to only `GENERIC`-builds.
---
[1] Although this obviously depends on the PDF software following the specification, by updating the second file ID as intended.
Using `instanceof Object` is generally problematic, since it's not guaranteed to always do the right thing for all Objects.
(I stumbled upon this while working on another patch, when I noticed that the `outlineView` was broken with workers disabled.)
- support paragraph margins, line height, letter spacing, ...
- compute missing dimensions from fields based almost on the dimensions of caption contents.
- it aims to fix#13583;
- fix the switch to breakBefore target;
- force the layout of an unsplittable element on an empty page;
- don't fail when there is horizontal overflow (except in lr-tb);
- handle correctly overflow in the same content area (bug 1717805, bug 1717668);
- fix a typo in radial gradient first argument.
Given that this property is only used with password protected documents, and is consequently document-specific rather than viewer-specific, ensure that `IPDFLinkService.externalLinkEnabled` is actually being reset by `PDFViewerApplication.close`.
To make things less confusing/inconsistent, remove the *undocumented* `externalLinkEnabled` property from the `PDFLinkService` constructor and force it to always be manually set when needed.
- when the CSS line-height property is set to 'normal' then the value depends of the user agent. So use a line height based on the font itself and if for any reasons this value is not available use 1.2 as default.
- it's a partial fix for https://bugzilla.mozilla.org/show_bug.cgi?id=1717681.
This patch provides an overall simpler *and* more consistent way of handling the `viewport` parameter during printing of XFA forms, since it's now again guaranteed to always be an instance of `PageViewport`.
Furthermore, for anyone attempting to e.g. implement custom printing of XFA forms this probably cannot hurt either.
Given that the "print"-intent is special, and that we should always fallback to the "display"-intent, let's ensure that the code actually reflects that.
Also, ensure that the method always returns a `Promise` since that's what the documentation says.
The `web/ui_utils.js` file should be usable from basically anywhere in the `web/`-folder, hence it should ideally not have any dependecies on its own and particularily *not* onces that pull in entire (large) factories.
This extends the approach in PRs 12848 and 13606 to also apply to the `xfaLayer`, since otherwise XFA forms will be similarly broken in most non-default scroll/spread modes.
Note that as far as I can tell, this is *not* a regression but rather a bug which has existed since basically "forever".
**In order to reproduce this easily:**
- Open the viewer.
- Set the zoom level to `400%`,
- Search for "expression".
The problem here is that when scrolling matches into view, we're scrolling to the start of the *containing* `textLayer` element rather than the start of the highlighted match itself.[1] When the entire width (or at least most) of the page is visible in the viewer, that doesn't really matter though which is likely why this bug has gone unnoticed for so long.[2]
Given that the highlighted match can be placed anywhere, e.g. even at the very end, within its `textLayer` element it's quite easy to see why the current implementation becomes a problem at higher zoom levels. All of this is then *further* exacerbated by `PDFFindController.scrollMatchIntoView` using a negative left offset, to ensure that the current match has some (visible) context available once scrolled into view.
In order to address this long-standing bug, we'll determine the (left) offset of the `selected` match and use that to modify the final position scrolled to in `PDFFindController.scrollMatchIntoView` such that the match is visible regardless of zoom level.
---
[1] Unfortunately we cannot directly scroll to the `selected` match, since it's not absolutely positioned and changing that would cause other bugs/regressions (note recent patches in that area).
[2] I did actually stumble upon this problem a little while ago, while working on PR 13482, but forgot to look into this again until I saw the new issue.
With the introduction of `PDFScriptingManager._closeCapability` in PR 13074, the pre-existing `PDFScriptingManager._pageEventsReady` boolean essentially became redundant.
Given that you always want to avoid tracking closely related state *separately*, since it's easy to introduce subtle bugs that way, we should just remove `PDFScriptingManager._pageEventsReady` now.
Obviously I *should* have done this already back in PR 13074, sorry about the churn here!
- a checkbox or radio doesn't have to be rescaled when the container is large so give the extra space to the caption to avoid some word wrapping.
- when the caption is on the right, then put ui on the left as first element and so remove flex:row-reverse stuff.
- some elements weren't displayed because their rotation angle was not taken into account;
- fix box model (XFA concept):
- remove use of outline;
- position correctly border which isn't part of box dimensions;
- fix margins issues (see issue #13474).
- move border on button instead of having it on wrapping div;
This code was added in PR 3968, apparently in order to fix scrolling of search results in HiDPI-mode.
However, after PR 4570 nothing is setting these `dataset`-properties any more and this is thus dead code which should be removed. (If that change had broken scrolling of search results in HiDPI-mode, you'd really expect that it'd been reported and fixed a long time ago.)
I missed this during review, since some of the changes in `web/pdf_print_service.js` broke printing.
Also, as part of these changes the patch replaces what looks like unnecessary `setAttribute` usage with "regular" `className` assignment and finally updates a couple of the CSS-rules to be more consistent.
- I thought it was possible to rely on browser layout engine to handle layout stuff but it isn't possible
- mainly because when a contentArea overflows, we must continue to layout in the next contentArea
- when no more contentArea is available then we must go to the next page...
- we must handle breakBefore and breakAfter which allows to "break" the layout to go to the next container
- Sometimes some containers don't provide their dimensions so we must compute them in order to know where to put
them in their parents but to compute those dimensions we need to layout the container itself...
- See top of file layout.js for more explanations about layout.
- fix few bugs in other places I met during my work on layout.
This patch fixes the referenced bugs/issues, in a way that won't interfere with keyboard users, assuming that we actually want to fix these old bugs/issues. (If not, we should close them as WONTFIX.)
After PR 13117 it's now (finally) possible for *different* build targets to specify individual options/preferences, and we can utilize that to only expose the `renderer`-preference in builds where `SVGGraphics` is actually defined.
Note that for e.g. `MOZCENTRAL`-builds, trying to enable SVG-rendering will throw immediately and the preference thus doesn't make sense to include there.
Also, update the dummy `SVGGraphics` to use a class, tweak the `PDFJSDev`-check in `src/display/svg.js` to agree fully with the option/preference, and remove an unnecessary `eslint-disable`.
Reasons for the removal include:
- This functionality was always somewhat experimental and has never been enabled by default, partly because of worries about rendering bugs caused by e.g. bad/outdated graphics drivers.
- After the initial implementation, in PR 4286 (back in 2014), no additional functionality has been added to the WebGL implementation.
- The vast majority of all documents do not benefit from WebGL rendering, since only a couple of *specific* features are supported (e.g. some Soft Masks and Patterns).
- There is, and has always been, *zero* test-coverage for the WebGL implementation.
- Overall performance, in the PDF.js library, has improved since the experimental WebGL implementation was added.
Rather than shipping unused *and* untested code, it seems reasonable to simply remove the WebGL implementation for now; thanks to version control it's always possible to bring back the code should the need ever arise.
This functionality was originally implemented in PR 7029; however it's not, nor has it ever been, used as far as I can tell.[1]
Note in particular that the default viewer does not expose either a preference or even an option with which `disableCanvasToImageConversion` can be toggled, and source-code modification is thus required.
Furthermore, note also that we have multiple other instances of `canvas`-data accesses in both the `src/display/canvas.js` and `src/display/text_layer.js` files. If any of those are blocked, by e.g. browser settings, there will be outright rendering bugs and non-working thumbnails thus seem like a very small issue in the grand scheme of things; hence why I'm suggesting that we remove the unused `disableCanvasToImageConversion` functionality.
---
[1] For the Tor use-case mentioned in issue 7026, I *believe* that the solution was to white-list `canvas`-data accesses for its built-in PDF Viewer.
- app.alert and few other function can use an object as parameter ({cMsg: ...});
- support app.alert with a question and a yes/no answer;
- update field siblings when one is changed in an action;
- stop calculation if calculate is set to false in the middle of calculations;
- get a boolean for checkboxes when they've been set through annotationStorage instead of a string.
Also, removes a couple of unnecessary local variables in the `Stepper.breakIt` method.
Finally, this patch also disables the ESLint `no-var` rule, in preparation for the next patch, for a couple of data-structures that need to remain globally available.
Given that both the textLayer rendering *and* the structTree parsing is asynchronous, it's possible that we'll attempt to insert the structTree in a removed page. While there's thankfully no outright breakage caused by this, it will nonetheless lead to errors being printed in the console and we should obviously avoid this.
To reproduce this bug (without the patch), open http://localhost:8888/web/viewer.html?file=/test/pdfs/pdf.pdf#disableStream=true&disableAutoFetch=true and scroll *very quickly* through the document and notice the following error being (intermittently) printed in the console:
```
Uncaught (in promise) TypeError: can't access property "appendChild", this.canvas is undefined
```
Using `for...of` is a modern and generally much nicer pattern, since it gets rid of unnecessary callback-functions. (In a couple of spots, a "regular" `for` loop had to be used.)
According to a decision by UX and PM, please see https://bugzilla.mozilla.org/show_bug.cgi?id=1705060#c2 (and implemented in https://bugzilla.mozilla.org/show_bug.cgi?id=1705327), we no longer show the notification-bar in Firefox; hence the special `PDFViewerApplication._delayedFallback` functionality should no longer be necessary.
Furthermore, note that at this point in time *most* of the features which used the `PDFViewerApplication._delayedFallback` functionality is now enabled by default; hence that provides even less reason to keep this code around and existing calls are thus converted to "regular" `PDFViewerApplication.fallback` calls.
According to a decision by UX and PM, please see https://bugzilla.mozilla.org/show_bug.cgi?id=1705060#c2, in Firefox we should first of all *not* display the notification-bar for signatures. Secondly, as can also be seen there, we shouldn't display the notification-bar *at all* and it's thus disabled in https://bugzilla.mozilla.org/show_bug.cgi?id=1705327.
If we purposely don't display a notification, for documents with signatures, in the *built in* Firefox PDF Viewer then it cannot be necessary in the GENERIC viewer either.
To simplify the overall implementation, given that it only applies to the GENERIC-viewer, this patch purposely re-uses the existing `errorWrapper`-functionality to display the message.
While that one is mostly intended for actual *errors*, by re-using it here we considerably reduce the amount of code/complexity necessary for supporting this new warning. It's obviously possible to re-factor/improve this later on, but the patch should do just fine here since it'll indeed inform users (of the GENERIC-viewer) about unverified signatures.
Finally this patch also tweaks the background-color of the `errorWrapper`, making it 20 percent lighter respectively darker (depending on the theme) to make it "stand out" a little bit *less*.[1] While it may perhaps be useful to re-style/re-factor the `errorWrapper`, this patch probably isn't the right place for doing that.
---
[1] Note how in the MOZCENTRAL-viewer, which instead uses the browser notification-bar, we're purposely using a neutral colour to not draw too much attention to the notification-bar.
This is first of all consistent with existing API-methods, where we return `null` when the data in question doesn't exist. Secondly, it should also be (slightly) more efficient since there's less dummy-data that we need to transfer between threads.
Finally, this prevents us from adding an empty/unnecessary span to *every* single page even in documents without any structure tree data.
I (unsurprisingly) managed to forget about handling the case where a "pagesloaded" event arrives *before* the outline has been parsed, in which case we'd not actually enable the `currentOutlineButton` as intended.
Also, in the "pagesloaded" event handler, we should ensure that there's actually any pages loaded since otherwise the "find current outlineItem"-feature doesn't make any sense.
- but don't validate them for now;
- Firefox will display a bar to warn that the signature validation is not supported (see https://bugzilla.mozilla.org/show_bug.cgi?id=854315)
- almost all (all ?) pdf readers display signatures;
- validation is done in edge but for now it's behind a pref.
It's obviously better and more correct to handle the "pagesloaded" case within `PDFOutlineViewer` *itself*, rather than essentially splitting the logic in two parts and forcing `PDFSidebar` to deal with what should've been handled internally in `PDFOutlineViewer`.
This is what I *should* have done in PR 12777, but for some reason didn't figure out how to implement it well enough back then; sorry about the churn here!
*This patch fixes some technical debt in the viewer.*
Given that most API methods are (purposely) asynchronous, there's always a risk that the viewer could have been `close`d before the requested data arrives.
Lately we've started to check this case before using the data, to prevent errors and/or inconsistent state, however the outline/attachments/layers fetching and rendering is old enough that it pre-dates those checks.
When a PDF is "marked" we now generate a separate DOM that represents
the structure tree from the PDF. This DOM is inserted into the <canvas>
element and allows screen readers to walk the tree and have more
information about headings, images, links, etc. To link the structure
tree DOM (which is empty) to the text layer aria-owns is used. This
required modifying the text layer creation so that marked items are
now tracked.
Scripting, as implemented, requires access to a complete document/viewer in order to work. Hence it doesn't really make sense to keep the `enableScripting`-option on `PDFPageView`-instances.[1]
---
[1] Note that there's the `PDFSinglePageViewer`, which can be used in cases where you want access to all features/functionality of the viewer but only display *one* page at a time.
Note how we purposely don't expose the `AnnotationStorage`-class directly in the official API (see `src/pdf.js`), since trying to use *multiple* ones simultaneously doesn't really make sense (e.g. in the viewer).
Instead we lazily initialize, and cache, just *one* instance via `PDFDocumentProxy.annotationStorage` which should thus be available internally in the API itself without having to be manually passed to various methods.
To support these changes, the `AnnotationStorage`-instance initialization is moved into the `WorkerTransport`-class to allow both `PDFDocumentProxy` and `PDFPageProxy` to access it.
This patch implements the following simplifications:
- Remove the `annotationStorage`-parameter from `PDFDocumentProxy.saveDocument`, since it's already available internally.
Furthermore, while it's currently possible to call that method without an `AnnotationStorage`-instance, that really does *not* make any sense at all. In this case you're effectively reducing `PDFDocumentProxy.saveDocument` to a "regular" `PDFDocumentProxy.getData` call, but with *a lot* more overhead, which was obviously not the intention of the `PDFDocumentProxy.saveDocument`-method.
- Try to discourage third-party users from calling `PDFDocumentProxy.saveDocument` unconditionally, as a replacement for `PDFDocumentProxy.getData` (note the previous point).
- Replace the `annotationStorage`-parameter, in `PDFPageProxy.render`, with a boolean `includeAnnotationStorage`-parameter which simply indicates if the (internally available) `AnnotationStorage`-instance should be used during rendering (e.g. for printing).
- By removing the need to *manually* provide `annotationStorage`-parameters to various API-methods, using the API should become simpler (e.g. for third-parties) since you no longer need to worry about manually fetching and passing around this data.
The reason for the fairly large discrepancy, in the thumbnail quality, between the `draw`/`setImage`-methods is that in the former case we *directly* render the thumbnails at the final size that they'll appear at in the sidebar. In the latter case, we instead downsize the (generally) much larger "regular" pages.
To address this, I'm thus proposing that we let `PDFThumbnailView.draw` render thumbnails at *twice* their intended size and then downsize them to the final size.
Obviously this will increase *peak* memory usage during thumbnail rendering in `PDFThumbnailView.draw`, since doubling the width/height of a `canvas` will lead to its pixel-count increasing by a factor of `4`. Furthermore, since you need four components per pixel (given that it's RGBA-data), this will thus lead to the *temporary* thumbnail `canvas`-sizes increasing by a factor of `16` during rendering. Hence why rendering thumbnails at their "original" scale, i.e. using something like `PDFPageProxy.getViewport({ scale: 1 });`, would be an absolutely terrible idea!
To reduce the size and scope of these changes, I've tried to re-factor and re-use as much of the existing downsizing-implementation already present in `PDFThumbnailView` as possible.
While this will generally *not* make thumbnails rendered by `PDFThumbnailView.draw` look *identical* to those based on the rendered pages (via `PDFThumbnailView.setImage`), it's a considerable improvement as far as I'm concerned and enough to call the issue fixed.
*Please note:* This patch will not lead to *any* additional overhead, in either memory usage or parsing, for thumbnails which are based on the rendered pages.
A loop is less efficient than just overwriting the content, which is what we've generally been using (for years) in other parts of the code-base (see e.g. `BaseViewer` and `PDFThumbnailViewer`).
These properties are always updated/used together, and there's no other methods which depend on just one of them, hence they're changed into local variables instead.
Looking through the history of this code, it seems they were converted *from* local variables and to properties all the way back in PR 2914; however as far as I can tell from that diff it doesn't seem to have been necessary even back then!?