When the user edits the URL and changes the reference fragment (hash),
PDF.js intercepts this action, and saves the then-current history state
in the previous history entry. This is implemented by navigating back,
editing the history and navigating forward again.
The current logic has a flaw: It assumes that calling history.back() and
history.forward() immediately updates the history state. This is however
not guaranteed by the web standards, which states that calling e.g.
history.back "must traverse the history by a delta -1", which means that
the browser must QUEUE a task to traverse the session history, per spec:
http://w3.org/TR/2011/WD-html5-20110113/history.html#dom-history-backhttps://html.spec.whatwg.org/multipage/browsers.html#dom-history-back
Firefox and Internet Explorer deviate from the standards by immediately
changing the history state instead of queuing the navigation.
WebKit derived browsers (Chrome, Opera, Safari) and Opera presto do not.
The user-visible consequence of strictly adhering to the standards in
PDF.js can be shown as follows:
1. Edit the URL.
2. Append #page=2 for example.
3. Press Enter.
-> Presto and WebKit: PDF.js reverts to the previous URL.
-> Gecko and Trident: PDF.js keeps the new URL, as expected.
To fix the issue, modification of the previous history item happens in
a few asynchronous steps, guided by the popstate event to detect when
the history navigation request has been committed.
--
Some more implementation notes:
I have removed the preventDefault and stopPropagation calls, because
popstate is not cancelable, and window is already the last target of the
event propagation.
The previous allowHashChange logic was hard to follow, because it did
not explain that hashchange will be called twice; once during the
popstate handler for history.back() (which will reset allowHashChange),
and again for history.forward() (where allowHashChange will be false).
The purpose of allowHashChange is now more explicit, by incorporating
the logic in the replacePreviousHistoryState helper function.
*With this patch we're getting very close to fixing 6158.*
The only use-case for `PDFViewerApplication.updateScaleControls` is to try and avoid calling `selectScaleOption` from the `scalechange` event handler in viewer.js.
This will *only* happen when the user has manually changed the scale by using the `<select>` dropdown, which means that in reality this is just a micro optimization. Furthermore, `selectScaleOption` is only skipped for the "named" scale values (e.g. `auto`, `page-actual`, `page-fit`, `page-width`), thus further reducing the value of this code.
Also, since we're updating the scale `<select>` dropdown from an event handler, we're currently depending on the event being dispatched (and handled) completely before the next `scalechange` event. Relying on the execution order of the code in this way, even though it currently works, seems unfortunate since it *could* potentially cause the internal scale value and the UI from getting out of sync.
Xref offsets are relative to the start of the PDF data, not to the start
of the PDF file. This is clear if you look at the other code:
- In the XRef's readXRefTable and processXRefTable methods of XRef, the
offset of a xref entry is set to the bytes as given by a PDF file.
These values are always relative to the start of the PDF file (%PDF-).
- The XRef's readXRef method adds the start offset of the stream to
Xref entry's offset: "stream.pos = startXRef + stream.start".
Clearly, this line assumes that the entry offset excludes the start
offset.
However, when the PDF is parsed in recovery mode, the xref table is
filled with entries whose offset is relative to the start of the stream
rather than the PDF file. This is incorrect, and the fix is to subtract
the start offset of the stream from the entry's byte offset.
The manually created PDF file serves as a regression test. It is a valid
PDF, except:
- The integer to point to the start of the xref table and the %%EOF
trailer are missing. This will activate recovery mode in PDF.js
- Some junk was added before the start of the PDF file. This exposes the
bad offset bug.
The PDF specification (cited below) specifies a maximum length of a name
in bytes as a minimal architectural limit. This means that PDF *writers*
should not create names that exceed 127 bytes.
It does not forbid PDF *readers* to accept such names though. These
names are only used internally to link PDF objects to other objects. For
these use cases, the lengths of the names do not really matter. Hence I
have changed the implementation to not treat long names as errors, but
warnings.
> (7.3.5) The length of a name shall be subject to an implementation
> limit; see Annex C.
>
> (Annex C.2) Table C.1 describes the minimum architectural limits that
> should be accommodated by conforming readers running on 32-bit
> machines. Because conforming readers may be subject to these limits,
> conforming writers producing PDF files should remain within them.
>
> (Table C.1) name 127 "Maximum length of a name, in bytes."
http://adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/PDF32000_2008.pdf
*This is the next step towards fixing 6158.*
This patch removes the dependency on the state of the scale `<select>` dropdown from the `resize` event handler, and instead uses the (in `PDFViewer`) stored `currentScaleValue`.
I believe that the way this code is currently written is purely for historical reasons, since originally *only* the numerical scale was stored internally (hence there was no other way to access the scale value).
However, since we now store the scale value, we should use it instead of quering the DOM. This helps ensure that the internally stored scale value is always accurately displayed in the UI (which should be good since, after the creation of `PDFViewer`, the `<select>` DOM element is now updated by an event handler).
*The next step towards fixing issue 6158.*
We can just as well access `pdfViewer.currentScaleValue` directly in `PDFViewerApplication`, instead of having a helper function which just acts as a wrapper for it.
Currently if the zoom level is reset multiple times in a row, i.e. by pressing <kbd>Ctrl</kbd>+<kbd>0</kbd>, the pages can be re-rendered each time even though their size shouldn't change. Whether this happens can depend on the size of the viewer, but documents with pages in landscape mode seem to be very susceptible to this. (An example is: https://wiki.mozilla.org/images/5/55/MobileOpportunity.pdf.)
This can also effect documents with pages in portrait mode, when they are displayed in Presentation Mode.
The reason for this unnecessary re-rendering is that due to limited numerical precision, the new scale value may change in *only* the last decimal place.
This became obsolete in bdeca30fbf. All it does is call the Annotation contructor and add hasHtml. This patch lets the Link and Text annotations directly extend the Annotation class and add hasHtml themselves.
This patch also removes an unused global.
This patch is the the first step towards to addressing issue 6158, which will be done by refactoring the code for setting/getting the current scale in `viewer.js`.
As explained in
https://github.com/mozilla/pdf.js/issues/6174#issuecomment-118502802.
To verify that this patch works:
1. Build the Chrome extension (node make chromium)
2. Load the Chrome extension (at chrome://extensions)
3. Visit https://robwu.nl/pdfjs/issue6174/.
4. Verify that PDF.js is not used to load the PDF. Either Chrome's
default PDF Viewer is used, or the PDF is offered as a file download.
Before this patch, zooming in/out via the scroll wheel caused the page
to be zoomed relative to the upper-left corner of the page, i.e. the
upper-left corner of the page stays at a fixed position.
After this patch, the page is zoomed relative to the cursor position,
i.e. after zooming in/out, the part under the cursor 'has not moved'.
This only applies when the page does not fit in the viewport, because
pages smaller than the viewpoer are always centered.