The following are some highlights of this patch:
- In the Worker we only extract a *subset* of the potential contents of the `Usage` dictionary, to avoid having to implement/test a bunch of code that'd be completely unused in the viewer.
- In order to still allow the user to *manually* override the default visible layers in the viewer, the viewable/printable state is purposely *not* enforced during initialization in the `OptionalContentConfig` constructor.
- Printing will now always use the *default* visible layers, rather than using the same state as the viewer (as was the case previously).
This ensures that the printing-output will correctly take the `Usage` dictionary into account, and in practice toggling of visible layers rarely seem to be necessary except in the viewer itself (if at all).[1]
---
[1] In the unlikely case that it'd ever be deemed necessary to support fine-grained control of optional content visibility during printing, some new (additional) UI would likely be needed to support that case.
Given that the Fetch API is supported since Node.js 18 we should be able to use it when downloading l10n files, which allows us to simplify the code and to make it fully `async`.
The fact that the highlight-thickness can only be changed in "free" mode isn't really obvious visually in the toolbar, so attempt to provide at least some indication of the `disabled`-state by "dimming" the slider.
In implementing caret browsing mode in pdf.js, I didn't notice that selectstart isn't always triggered.
So this patch removes the use of selectstart and rely only on selectionchange.
In order to simplify the selection management, the selection code is moved in the AnnotationUIManager:
- it simplifies the code;
- it allows to have only one listener for selectionchange instead of having one by visible page
for selectstart.
I had to add a delay in the integration tests for highlighting (there's a comment with an explanation),
it isn't really nice, but it's the only way I found and in real life there always is a delay between
press and release.
The function caretPositionFromPoint return the position within the last visible element
and sometimes there are some elements on top of the ones in the text layer.
So the idea is to hide the visible elements which aren't in the text layer in order
to get the right caret position.
*Please note:* This is a micro optimization, hence I fully understand if the patch is rejected.
Currently we create two temporary Arrays and have to iterate twice in total when building the final `hexNumbers` Array.
With this patch there's only one temporary Array and a single iteration required to build the final `hexNumbers` Array.
When highlighting, the annotation editor layer is disabled to get pointer events
from the text layer, but the annotation layer must be then disabled either in
order to avoid bad interactions.
Previously we'd simply export this directly from `web/app_options.js`, which meant that it'd be technically possible to *accidentally* modify the `compatibilityParams` Object when accessing it.
To avoid this we instead introduce a new `AppOptions`-method that is used to lookup data in `compatibilityParams`, which means that we no longer need to export this Object.
Based on these changes, it's now possible to simplify some existing code in `AppOptions` by taking full advantage of the nullish coalescing (`??`) operator.
Given that the "PREFERENCE" kind is used e.g. to generate the preference-list for the Firefox PDF Viewer, those options need to be carefully validated.
With this patch we'll now check this unconditionally in development mode, during testing, and when creating the preferences in the gulpfile.
As part of the changes in PR 17686 we "accidentally" enabled source-maps for the *minified* builds, which seems unnecessary since those have never been included in the `pdfjs-dist` output.
Locally this patch reduces the run-time of `gulp minified` by ~15 percent.
Rather than first building the library and then use Terser "manually" to minify the files, we can utilize a Webpack plugin to combine these steps which helps to simplify the gulpfile.
The `handler` method contained this code in two inline functions,
triggered via callbacks, which made the `handler` method big and harder
to read. Moreover, this code relied on variables from the outer scope,
which made it harder to reason about because the inputs and outputs
weren't easily visible.
This commit fixes the problems by extracting the request checking code
into a dedicated private method, and modernizing it to use e.g. `const`/
`let` instead of `var` and using template strings. The logic is now
self-contained in a single method that can be read from top to bottom
without callbacks and with comments annotating each check/section.
- Run the minification in "parallel" since that should be a *tiny* bit more efficient.
- Don't rename the minified files since that seems unnecessary, especially considering that they are only used in the `dist-pre` target where we currently change the name back manually.
After the changes in PR 17637 there's no longer any reason to invoke `tweakWebpackOutput` without an argument, since the `__non_webpack_import__` re-writing was moved into the Babel plugin.
This way we can avoid a (little) bit of unnecessary parsing during building.
The specs are unclear about what kind of xref table format must be used.
In checking the validity of some pdfs in the preflight tool from Acrobat
we can guess that having the same format is the correct way to do.
The pdf in the mentioned bug, after having been changed, wasn't correctly
displayed in neither Chrome nor Acrobat: it's now fixed.
Note how we're using custom `__non_webpack_import__`-calls in the code-base, that we replace during the post-processing stage of the build, to be able to write `import`-calls that Webpack will leave alone during parsing.
This work-around is necessary since we let Babel discards all comments, given that we generally don't need/want them in the builds, hence why we cannot utilize `/* webpackIgnore: true */`-comments in the source-code.
After the changes in PR 17563 it thus seems to me that we should be able to just move this re-writing into the Babel plugin instead.
The `handler` method contained this code in an inline function, which
made the `handler` method big and harder to read. Moreover, this code
relied on variables from the outer scope, which made it harder to reason
about because the inputs and outputs weren't easily visible.
This commit fixes the problems by extracting the directory listing code
into a dedicated private method, and modernizing it to use e.g. `const`/
`let` instead of `var` and using template strings.
The `handler` method contained this code in an inline function, which
made the `handler` method big and harder to read. Moreover, this code
relied on variables from the outer scope, which made it harder to reason
about because the inputs and outputs weren't easily visible.
This commit fixes the problems by extracting the range file serving code
into a dedicated private method, and modernizing it to use e.g. `const`/
`let` instead of `var` and using template strings.
The `handler` method contained this code in an inline function, which
made the `handler` method big and harder to read. Moreover, this code
relied on variables from the outer scope, which made it harder to reason
about because the inputs and outputs weren't easily visible.
This commit fixes the problems by extracting the file serving code into
a dedicated private method, and modernizing it to use e.g. `const`/`let`
instead of `var` and using template strings.
Currently the `web/app.js` file pulls in various build-specific dependencies, via the use of import maps, and those files in turn import from `web/app.js` thus creating undesirable import cycles.
To avoid this we instead pass in a `PDFViewerApplication`-reference, immediately after it's been created, to the relevant code.
Note that we use an ESLint plugin rule, see `import/no-cycle`, that is normally able to catch import cycles. However, in this case import maps are involved which is why this wasn't caught.
Looking at the *built* files you'll notice some lines containing nothing more than a semicolon. This is the result of (mostly top-level) `if`-statements, which include `PDFJSDev`-checks, that evalute to `false` during Babel parsing.
This has always annoyed me a bit, and looking at Babel plugin it seems that we can fix this simply by *removing* the relevant nodes.
This part of the (modern) preprocessor is now dead code, since we no longer use `require` statements anywhere in the main code-base.
Note that as part of the changes leading up to PDF.js version `4` we removed all[1] the remaining `require` statements, and we also have an ESLint rule to ensure that no new ones are accidentally added.
---
[1] With two small exceptions, in benchmarking-code and in the Webpack-example.
Given that we need to pass in a `PDFDataRangeTransport`-instance a number of the needed parameters can be obtained from it, rather than having to specify them manually.
This should be a *tiny* bit more efficient, since it avoids parsing substrings that we don't care about.
*Please note:* I cannot find an ESLint rule to enforce this automatically.
- Ensure that localization works in the GENERIC viewer, even if the necessary locale files cannot be loaded.
This was the behaviour prior to the introduction of Fluent, and it seems worthwhile to keep that (especially since we already bundle the en-US strings anyway).
- Let the `GenericL10n`-implementation use the *bundled* en-US strings directly when no language is provided.
- Remove the `NullL10n`-implementation, and simply fallback to `GenericL10n`, to reduce the maintenance burden of viewer-components localization.
- Indirectly, given the previous point, stop exporting `NullL10n` in the viewer-components since it's now removed.
Note that it was never really intended to be used directly and only existed as a fallback.
*Please note:* This doesn't affect the Firefox PDF Viewer, thanks to the use of import maps.
When an highlight is self-intersecting, the outline was drawn inside.
In order to remove it, we use a svg mask to exclude the shape inside
when drawing the outlines.
That leads to change the outline 1px,white-2px,blue-1px,white to a
2px,white-2px,blue: the part of the stroke which is inside the shape
is removed because of the mask.
All of our static evaluation & dead-code elimination transforms need to
happen in post-order, transforming inner nodes first. This is so that
in complex nested cases all transforms see the simplified version of
their inner nodes.
For example:
async getNimbusExperimentData() {
if (!PDFJSDev.test("GECKOVIEW")) { return null; }
// other code
}
-> [evaluation of PDFJSDev.*]
async getNimbusExperimentData() {
if (!false) { return null; }
// other code
}
-> [!false -> true]
async getNimbusExperimentData() {
if (true) { return null; }
// other code
}
-> [if (true) -> replace with the if branch]
async getNimbusExperimentData() {
return null;
// other code
}
-> [early return -> remove dead code]
async getNimbusExperimentData() {
return null;
// other code
}
This was done correctly in all cases except for our `UnaryExpression`
transform, which was happening in pre-order.
Having this parameter among a list of DOM-elements seems slightly strange now, however this is very old code hence the explanation for why this was done is for historical reasons (as is often the case).
Hence we can simply move this into `AppOptions` instead, which seems more appropriate overall.
Given that only the GENERIC viewer supports opening more than one PDF document, we can simplify things a tiny bit by instead generating the necessary DOM-element in JavaScript.
This unit-test is now failing in up to date versions of Node.js respectively Chromium-browsers, since `CompressionStream` no longer produces consistent data across all environments/browsers.
However logging the compressed TypedArray produced by `writeStream`, with Firefox respectively Chrome, and then feeding *both* of those TypedArray as input to `DecompressionStream` produced the same (correct) result in both browsers.
Hence the *exact* output of `CompressionStream` shouldn't matter, as long as we're able to successfully decompress it when the resulting PDF document is opened with the PDF.js library, and the unit-test is thus extended to check this.
Starting with Chrome 120.0.6099.109 (shipped with Puppeteer 21.8.0+) the
unit test fails in Chrome as well. The issue is tracked in #17399, but
for now we'll only run the unit test in Firefox so we can continue to
update Puppeteer while also still having a browser in which it runs,
until we figure out why the behavior of `CompressionStream` changed.
The `DefaultExternalServices` code, which is used to provide build-specific functionality, is very old. This results in a pattern where we first initialize `PDFViewerApplication.externalServices` and then *override* it for the different builds.
By converting `DefaultExternalServices` into a "regular" class, and leveraging import maps, we can directly initialize the correct instance depending on the build.
Given the simplicity of the `createPreferences` method, we can leverage import maps to directly initialize the correct `Preferences`-instance depending on the build.
Given the simplicity of the `createDownloadManager` method, we can leverage import maps to directly initialize the correct `DownloadManager`-instance depending on the build.
The latest mozilla-central update has test failures, because some CSS variables are not "properly" referenced; in particular:
- Give `--hcm-highlight-selected-filter` a default value, of `none`, similar to the previously existing HCM filter.
- Remove the `--mix-blend-mode` variable, since it's unused.
It isn't really a fix for the mentioned bug but it slightly improve things.
In reducing the memory use, the time spent in the GC is reduced either.
The algorithm to compute the bounding box is the same as before but it has just
been rewritten to be more efficient.
This commit converts the pdfjsdev-loader transform into a Babel plugin,
to skip a AST->string->AST round-trip.
Before this commit, the webpack build process was:
1. Babel parses the code
2. Babel transforms the AST
3. Babel generates the code
4. Acorn parses the code
5. pdfjsdev-loader transforms the AST
6. @javascript-obfuscator/escodegen generates the code
7. Webpack parses the file
8. Webpack concatenates the files
After this commit, it is reduced to:
1. Babel parses the code
2. Babel transforms the AST
3. babel-plugin-pdfjs-preprocessor transforms the AST
4. Babel generates the code
5. Webpack parses the file
6. Webpack concatenates the files
This change improves the build time by ~25% (tested on MacBook Air M2):
- `gulp lib`: 3.4s to 2.6s
- `gulp dist`: 36s to 29s
- `gulp generic`: 5.5s to 4.0s
- `gulp mozcentral`: 4.7s to 3.2s
The new Babel plugin doesn't support the `saveComments` option of
pdfjsdev-loader, and it just always discards comments. Even though
pdfjsdev-loader supported multiple values for that option, it was
effectively ignored due to `acorn` dropping comments by default.
This is in preparation for the next commit, which will convert
preprocessor2.mjs to a Babel plugin. The purpose of this commit
is to help git track the rename regardless of the large amount
of changes.
In the Gulpfile only the exit codes of `test.mjs` child processes
erroneously aren't checked. This causes failures in `test.mjs` to be
logged but not propagated to the master process, which in turn causes
test runners such as GitHub Actions to succeed because they only
monitor the master process. This is easy to reproduce by throwing an
error at the top of `test.mjs` and running `gulp makeref` or `gulp
unittest`: the error is logged, but the task that spawned the child
process succeeds and the master process exits with exit code 0. This is
problematic because it can easily cause errors to go by unnoticed.
This commit fixes the issue by making sure that the `test.mjs`
invocations are handled in the same way as the other child processes
in the file, i.e., if the child process exits with a non-zero exit code
then the master process also exits with a non-zero exit code. After this
patch the error is still logged, but the task now also fails and the
master process exits with exit code 1 to properly signal failure.
This manually ignores some cases where the resulting auto-formatting would not, as far as I'm concerned, constitute a readability improvement or where we'd just end up with more overall indentation.
Please see https://eslint.org/docs/latest/rules/arrow-body-style
For arrow functions that are both simple and short, we can avoid using explicit `return` to shorten them even further without hurting readability.
For the `gulp mozcentral` build-target this reduces the overall size of the output by just under 1 kilo-byte (which isn't a lot but still can't hurt).
The `if` statement is no longer necessary because the Node.js versions
that didn't provide `dns.setDefaultResultOrder` are no longer supported,
but looking into this a bit more it turns out that the entire workaround
is no longer necessary because the issue got fixed in Firefox 105 in bug
1769994. Indeed, Firefox now starts nicely with the workaround removed.
Reverts 60ed3cd297c4045b90f4114a74e5baa4ef1c5056.
In order to do that we must change the text layer opacity to 1 but
it has several implications:
- the selection color must have an alpha component,
- the background color of the span used for highlighted words
must have an alpha component either, but now the opacity is 1
we can use some backdrop-filters in HCM making the highlighted
words more visible.
- fix a regression caused by #17196: the css variable --hcm-highlight-filter
has to live under the #viewer element because in HCM it's overwritten
by js at this level, hence links annotations for example didn't
have the right colors when hovered.
The free highlighting is enabled when the mouse pointer isn't on some text.
Then we draw a shape with smoothed borders corresponding to the movement of
the mouse.
Printing/saving and changing the thickness will come later.
and try to load the font family (guessed from the font name) before trying
the local substitution.
The local(...) command expects to have a real font name and not a predefined
substitution it's why we try the font family.
By always removing the "visibilitychange" listener in the `PDFViewer.#onePageRenderedOrForceFetch`-method we can (ever so slightly) reduce duplication in the code.
Ensure that users cannot provide incorrect values when trying to set the global worker-options.
This patch was prompted by occasionally seeing users manually loading the `pdf.worker.mjs`-file and then assigning it to the `workerSrc`-option, something that obviously doesn't make sense and will cause fake-workers to be used (with poor performance as a result).
When the text of an annotation is extracted in using getTextContent, consecutive white spaces
are just replaced by one space and. So this patch add an option to make sure that white
spaces are preserved when appearance is parsed.
For the case where there's no appearance, we can have a fast path to get the correct string
from the Content entry.
When an existing FreeText is edited, space (0x20) are replaced by non-breakable (0xa0) ones
to make to see all of them on screen.
The system locale (used in OffscreenCanvas) can be different from the one guessed by Fluent,
consequently, in order to avoid any mismatch, we just use an attached canvas element.
The original issue can easily be reproduced locally in adding a lang="ja" in viewer.html
(or with an other language for Japanese users).
With modern JavaScript class features we can move the relevant event handling into private methods, and thus invoke it directly when resetting the toolbar UI-state.
*Please note:* This patch slightly reduces the size of the `web/secondary_toolbar.js` file.
With modern JavaScript class features we can move the relevant event handling into private methods, and thus invoke it directly when resetting the toolbar UI-state.
*Please note:* This patch slightly reduces the size of the `web/toolbar.js` file.
In PR 11912 we started caching images that occur on multiple pages globally, which improved performance a lot in many PDF documents.
However, one slightly annoying limitation of the implementation is the need to re-parse the image once the global-caching threshold has been reached. Previously this was difficult to avoid, since large image-resources will cause cleanup to run on the main-thread after rendering has finished. In PR 16108 we started delaying this cleanup a little bit, to improve performance if a user e.g. zooms and/or rotates the document immediately after rendering completes.
Taking those two PRs together, we now have a situation where it's much more likely that the main-thread has "globally used" images cached at the page-level. Hence we can instead attempt to *copy* a locally cached image into the global object-cache on the main-thread and thus reduce unnecessary re-parsing of large/complex global images, which significantly reduces the rendering time in many cases.
For the PDF document in issue 11878, the rendering time of *the second page* changes as follows (on my computer):
- With the `master`-branch it takes >600 ms to render.
- With this patch that goes down to ~50 ms, which is one order of magnitude faster.
(Note that all other pages are, as expected, completely unaffected by these changes.)
This new main-thread copying is limited to "large" global images, since:
- Re-parsing of small images, on the worker-thread, is usually fast enough to not be an issue.
- With the delayed cleanup after rendering, it's still not guaranteed that an image is available in a page-level cache on the main-thread.
- This forces the worker-thread to wait for the main-thread, which is a pattern that you always want to avoid unless absolutely necessary.
This commit changes the code to use a template string and to use `const`
instead of `var`. Combined with the previous commits this allows for
enabling the ESLint `no-var` rule for this file now.
The test helper code largely predates the introduction of modern
JavaScript features and should be refactored to improve readability.
In particular callbacks make the code harder to understand and maintain.
This commit:
- replaces the callback argument with returning a promise;
- replaces the recursive function calls with a simple loop;
- uses `const`/`let` instead of `var`;
- uses arrow functions for shorter code;
- uses template strings for shorter string formatting code.
The test helper code largely predates the introduction of modern
JavaScript features and should be refactored to improve readability.
In particular callbacks make the code harder to understand and maintain.
This commit:
- replaces the callback argument with returning a promise;
- uses `const` instead of `var`;
- uses arrow functions for shorter code;
- uses template strings for shorter string formatting code;
- uses `Array.includes` for shorter response code checking code.
This test intermittently fails, likely because the auto-print is triggered fast enough
that we don't manage to get it.
So this patch aims to try to set a listener very early in order to be sure that
we'll be aware that a print has been triggered.
It seems this unit-test now fails consistently in "all" up-to-date Node.js versions. We should probably try and understand why, but for now just disable it to get passing CI tests.
There's obviously a few things wrong with the Annotations in the referenced PDF document, however parsing of an Annotation shouldn't just break if the /BS-entry isn't a dictionary.
When opening a pdf from the secondary toolbar, a second color picker is added.
So in order to avoid that, we just stop listening for annotationeditoruimanager
in the toolbar.
The doorhanger for highlighting has a basic color picker composed of 5 predefined colors
to set the default color to use.
These colors can be changed thanks to a preference for now but it's something which could
be changed in the Firefox settings in the future.
Each highlight has in its own toolbar a color picker to just change its color.
The different color pickers are so similar (modulo few differences in their styles) that
this patch introduces a new class ColorPicker which provides a color picker component
which could be reused in future editors.
All in all, a large part of this patch is dedicated to color picker itself and its style
and the rest is almost a matter of wiring the component.
When a pdf as a FreeText without appearance, we use a fake font in order to render it
and that leads to create few new refs for the font.
But then when we're saving, we create some new refs which start at the same number
as the previous created ones.
Consequently, when saving we're using some wrong objects (like a font) to check if
we're able to render the newly added FreeText.
In order to fix this bug, we just remove the persistent refs (which are only used
when rendering/printing) during the saving.
Having just tested PR 17337 locally I noticed that especially the `JpxImage`-test causes a "ridiculous" amount of warning messages to be printed, which doesn't seem helpful.
Given that only actual `Error`s should be relevant here, we can easily disable this logging during the tests.
The test helper code largely predates the introduction of modern
JavaScript features and should be refactored to improve readability.
In particular callbacks and recursive function calls make the code
harder to understand and maintain.
This commit:
- replaces the callback argument with returning a promise;
- replaces the recursive function calls with a simple loop;
- uses `const`/`let` instead of `var`;
- uses template strings for shorter string formatting code;
- improves the error messages to have more details.
The test helper code largely predates the introduction of modern
JavaScript features and should be refactored to improve readability.
In particular callbacks and recursive function calls make the code
harder to understand and maintain.
This commit:
- replaces the callback argument with returning a promise;
- uses `const` instead of `var`;
- uses arrow functions for shorter code;
- uses template strings for shorter string formatting code;
- improves the error messages to have more details.
This unfortunately broke in PR 17060, since I had completely forgotten about https://bugzilla.mozilla.org/show_bug.cgi?id=1632644#c5 when writing that patch.
The easiest solution, while slightly unfortunate, seems to be to add a couple of non-standard hash parameters specifically for the PDF attachment use-case in the Firefox PDF Viewer. (Note that we cannot use "nameddest" here, since we also need to support the stringified destination-Array case.)
It seems this unit-test started failing in Node.js version 20.10 as well. We should probably try and understand why, but for now just disable it to get passing CI tests.
Given that this event listener is only used to trigger rendering after the sidebar has been opened/closed, we can utilize the existing one in the `PDFSidebar` class for this purpose instead. That one is registered on the sidebar DOM-element, and is needed to remove a CSS-class indicating that the sidebar is moving.
It fixes few errors in the CSS for HCM.
It now complies to the specs from UI/UX.
Only the foreground must change in HCM and not the background, similarly to what
we had for the alt-text button before moving it.
After the two previous commits, which removed the remaining call-sites, this method is no longer used and can thus be removed.
As mentioned in the JSDocs for the now removed method, synchronous communication between the viewer and the platform code isn't really a good idea.
Once this patch has landed in mozilla-central some additional clean-up of the platform code will also be possible.
The return value is not, nor has it ever been, used for anything and we should thus be able to just send the message.
Note that the responses are already handled by the "message" event listener registered above.
This commit fixes the JSDoc comment for the `annotationEditorMode` setter.
The types tests fail on that now because the input value was changed from
a number to an object with various properties in recent patches, but the
JSDoc comment was not updated accordingly.
Moreover, the types tests also fail because TypeScript 5.3 assumes that
getters and setters have equal return and input value types, which is
arguably also what one would expect, but our `annotationEditorMode`
getter and setter deviate from that because the getter returns a number
while the setter accepts an object. Given that it seems more important
to document the setter entirely, including the meaning and types of its
properties, and the type of the getter can easily be inferred from this
comment and the other JSDoc comments that have `annotationEditorMode` in
it, we remove the getter type to make the types tests pass again.
- Extend the `fetchData` helper function to also support fetching of "blob" data.
- Use the `fetchData` helper function more in the code-base, when fetching non-PDF data. Given that the Fetch API isn't supported for all protocols, this should improve compatibility for the PDF.js library.
Currently the SVG images for the loading-icons exist in two versions, for the light- respectively dark-theme, which nowadays are the only "duplicated" icons left.
The reason for this is that these icons are being used in `input`-elements, where the regular `mask-image` approach used for all buttons don't work.
To address this we add containers for the `input`-elements, such that we have a "regular" DOM-element where we can use `mask-image`.
The goal is to be able to get these outlines to fill the shape corresponding
to a text selection in order to highlight some text contents.
The outlines will be used either to show selected/hovered highlights.
- Re-factor the existing `fetchData` helper function such that it can fetch more types of data, and it now supports "arraybuffer", "json", and "text".
This only needed minor adjustments in the `DOMCMapReaderFactory` and `DOMStandardFontDataFactory` classes.[1]
- Expose the `fetchData` helper function in the API, such that the viewer is able to access it.
- Use the `fetchData` helper function in the `GenericL10n` class, since this should allow fetching of localization-data even if the default viewer is run in an environment without support for the Fetch API.
---
[1] While testing this I also noticed a minor inconsistency when handling standard font-data on the worker-thread.
This is consistent with the implementation used in the (now removed) webL10n-library, and by only using lowercase language-codes internally in the `L10n`-implementations we should avoid future issues e.g. when users manually set the `locale`-option (in the default viewer).
Some fields, somewhere under the Fields entry in Acroform, could have no name (in T)
but with a parent which has a name but which isn't somewhere under Fields.
As a side-effect, this patch prevents infinite loops because of potential cycles
under Fields.
This commit migrates the font tests away from the bots. Not only are the
font tests broken on the Windows bot since some time, they also run on
Python 2 (end of life since January 2020) and `ttx` 3.19.0 (released in
November 2017). The latter is installed via a submodule, which requires
more complicated logic for finding and running `ttx`.
We solve the issues by implementing a modern workflow that installs the
most recent stable Python and `ttx` (`fonttools` package) versions. This
simplifies the `ttx` driver code as well because it can now assume `ttx`
is available on the path (just like we do for e.g. `node` invocations).
GitHub Actions takes care of creating a virtual environment with
`fonttools` in it so that the `ttx` entrypoint is available. Locally
the font tests can be run in a similar way by creating and sourcing a
virtual environment with `fonttools` in it before running the font
tests, and a README file is included with instructions for doing so.
This commit prepares for running the font tests on GitHub Actions where
we can't spin up headful browsers because there are no display
capabilities on the workers. This will also be useful for porting other
test targets to GitHub Actions at a later time, as well as running the
tests locally in headless mode.
This commit prepares for the introduction of extra options in later
commits by changing the function signatures of the `startBrowser(s)`
functions to take parameter objects instead of plain parameters. This
makes the call sites explicitly state which parameters they pass,
improving overall readability as well.
The current logic is more complicated than it needs to be because it's
passing a callback function to `startBrowsers` instead of a string.
This commit simplifies the logic by passing the base URL as a string to
`startBrowsers` and having it do further augmentation internally,
thereby removing all indirection of the function calls to `makeTestUrl`
and the inner function it returned.
After recent PRs the size and scope of the CI workflow is now reduced, and this patch tries to simplify things further. More specifically we can directly specify the gulp-tasks in the workflow, and thus clean-up the `gulpfile` a tiny bit.
Note that this will technically be slower, since the tests are now run in series (rather than in parallel), however `gulp externaltest` runs so quickly that it really won't matter in practice.
Hopefully this is enough to address the problem of initializing the Worker in Chromium-based browsers.
Locally I've tried to *force* use of `createCDNWrapper` in development mode, by commenting out the `isSameOrigin` checks, and worker-loading fails against `master` and works with this patch.
Currently the background-color of the `editorParamsToolbar`s don't match that of the arrow, which is especially noticable in dark mode (see zoomed-in screen-shots below).
The simplest solution seem to be to just style the `editorParamsToolbar`s like the `secondaryToolbar`, to limit the amount of CSS changes required.
This should *hopefully* fix 17228, by tweaking the build scripts to give the GENERIC viewer something to await to avoid breaking third-party users of the standalone viewer components.
This button is *only* used in the GENERIC viewer, and will currently be visible either in the main or secondary toolbars (depending on the viewer width).
To simplify upcoming changes, and to avoid then having to complicate the relevant CSS rules unnecessarily, let's place the "Open file"-button permanently in the secondary toolbar instead.
(Note that the GENERIC viewer also, since five years, supports drag-and-drop in order to open local files.)
The `fieldObjects`-getter is implemented in the `PDFDocument` class, which means that the `this._localIdFactory`-property that we pass to `AnnotationFactory.create` doesn't actually exist.
The reason that this hasn't caused any bugs, that I'm aware of, is that all /Fields-entries need to be References to actually make sense.
The `fieldObjects`-getter itself is called, from `src/core/worker.js`, in a way that'll ensure that any `MissingDataException`s are handled. However the problem is that the actual data-lookups in `fieldObjects` and `#collectFieldObjects` are done inside of a Promise, which means that `MissingDataException`s won't be handled and parsing could thus break.
To address this we change all data-lookups to be asynchronous instead.
The `viewerCssTheme`-implementation has always been somewhat hacky, and now it's also *partially* broken ever since we've started using CSS nesting.
Trying to support nested media queries would thus require a lot more parsing of the CSS rules, which seems inefficient and thus generally undesirable.[1]
As discussed on Matrix, let's try to remove the `viewerCssTheme`-option and see if there's any (significant) fallout from this.
---
[1] If this option is brought back, it seems to me that it (in Firefox) should probably be set through the platform-code that handles theming.
Depending on the structure of the outline we could potentially need to expand a few levels, especially in long PDF documents, hence it cannot hurt to pause translation in that case as well.
With the changes in PR 17208, where browser-preferences are now handled as "regular" viewer-options, we can tweak the definition of `canvasMaxAreaInBytes` to slightly simplify things in the `PDFViewerApplication.open` method.
Hopefully this will allow us to catch bugs in new Node.js versions earlier, rather than having to wait for bug reports.
Given that `CompressionStream` is (currently) only potentially used when saving a *modified* PDF document, which is unlikely to be a common use-case in Node.js environments, let's just disable the affected unit-test for now.
Given that this branch is only necessary in development mode and *during* building, but is never actually used in the final viewer-bundles, we can utilize the pre-processor to ignore this code.
Currently we *synchronously* fetch a number of browser preferences/options, from the platform code, during the viewer respectively PDF document initialization paths.
This seems unnecessary, and we can re-factor the code to instead include the relevant data when fetching the regular viewer preferences.
The active LTS version is now based on Node.js version 20, hence let's update the relevant workflows to use that one instead; see https://en.wikipedia.org/wiki/Node.js#Releases
Given that we still support Node.js version 18, i.e. the maintenance LTS version, in the PDF.js library we'll keep testing both versions in GitHub Actions to prevent regressions.
I noticed the following warning in the GitHub Actions workflow logs:
`Configuration file not found: .github/linter_config.yml`
The configuration file is called `fluent_linter_config.yml` instead, so
this commit fixes the path so it points to the correct file.
Fixes 487816b.
The current stable version of Python is Python 3.12, see
https://www.python.org/downloads, so we should switch to that since
Python 3.10 is older and only receives security updates.
This commit tweaks the Fluent linter workflow to match the other
workflow files we have, so we make sure the steps have a newline between
them for better readability and align names and descriptions of steps
with how they are called in the other workflow files we have.
There are environments that include *incomplete* polyfills for the `navigator`-object, which may thus cause the PDF.js library to break.
Despite that clearly not being our fault, it may still result in bug reports filed against the PDF.js project; see e.g. 15728.
Currently this even seem to affect *the latest* version of Node.js; see e.g. [here].
*Please note:* Thanks to the pre-processor none of these changes affect the Firefox PDF Viewer, however it does add "overhead" when working with and reviewing the affected code (which is why I'm not crazy about this).
*Please note:* While following the steps in the README still works with this patch, in the sense that the example runs and successfully renders a PDF document, I unfortunately cannot tell if it illustrates Webpack best practices.
- Remove the `errorWrapper`-element, since it simplifies the example and is consistent with the default viewer; see PR 15533.
- Simplify the l10n-handling, since the `NullL10n` should be able to translate everything e.g. without fallback values; see PR 17146.
Note that we must append the textLayer to the DOM *before* enabling the `highlighter` and `accessibilityManager`, to avoid breaking e.g. a pending searching operation.
The least invasive solution, that I was able to come up with, is to introduce a new `TextLayerBuilder` callback-function for this purpose.
Currently the `WidgetAnnotationElement._getKeyModifier` method will always be falsy on Linux, which seems like a simple oversight. Looking at all the other `FeatureTest.platform` accesses we only handle the `isMac`-case specially, and it seems reasonable to do the same thing here.
The reason that this hasn't led to any bug reports is most likely that the `modifier`-property seems completely unused in the scripting-implementation.
Finally, with these changes we can (slightly) simplify the `FeatureTest.platform` implementation.
After PR 17177 the interface of `XfaLayerBuilder` is now inconsistent, since whether or not we directly append the xfaLayer to the DOM now depends on the rendering intent.
I forgot to include `web/l10n_utils.js` in PR 17161, which currently breaks `ConstL10n` since there's no longer a method called `setL10n`; sorry about that!
Most of the strings shouldn't contain special chars (<= 0x1F) so we can
have a fast path which just checks if the string contains at least one such
a char.
To prevent the *standalone* viewer-components from breaking, we need to ensure that the `NullL10n`-interface won't accidentally diverge from the actual `L10n`-implementations.
Looking at the `PDFThumbnailView.setPageLabel` method you'll see that we update e.g. the "aria-label" of the thumbnail-image for documents that contain (valid) pageLabels.
This isn't done in `PDFPageView`, which seems inconsistent, hence this patch.
This patch changes almost all viewer-components[1] to use "data-l10n-id"/"data-l10n-args" for localization, which means that in many cases we no longer need to pass around the `L10n`-instance any more.
One part of the code-base where the `L10n`-instance is still being used "directly" is the AnnotationEditors, however while it might be possible to convert (most of) that code as well that's not attempted in this patch.
---
[1] The one exception is the `PDFDocumentProperties` dialog, since the way it's currently implemented makes that less straightforward to fix without a lot of code changes.
*Please note:* In the Firefox PDF Viewer this findbar is only used for PDF documents placed in e.g. `<iframe>` elements.
By registering a `ResizeObserver` when the `PDFFindBar` is open we slightly unify and simplify how the findbar layout (row vs column) is handled.
This will be especially helpful with upcoming changes, where we'll make use of "data-l10n-id"/"data-l10n-args" to trigger translation in the viewer.
- The old translation engine handled language code casing slightly differently, hence we need to tweak the non-metric locale check in `PDFDocumentProperties` to account for that.
- Use only lowercase names for the pre-defined page names, to improve overall consistency.
*Please note:* These changes only affect the GENERIC build, since `NullL10n` is only a stub elsewhere (see PR 17135).
After the changes in PR 17115, which modernized and improved l10n-handling, the `NullL10n`-implementation is no longer a good fallback for the "proper" `L10n`-classes.
To improve this situation, especially for the *standalone* viewer-components, this patch makes the following changes:
- Let the `NullL10n`-implementation extend an actual `L10n`-class, which is constant and lazily initialized, to ensure that it works *exactly* like the "proper" ones.
- Automatically bundle the "en-US" l10n-strings in the build, via the pre-processor, such that we don't need to remember to manually update them.
- Ensure that the *standalone* viewer-components register their DOM-elements for translation, similar to the default viewer, since this will allow future code improvements by using "data-l10n-id"/"data-l10n-args" in most (if not all) parts of the viewer.
- Remove the `NullL10n` from the `AnnotationLayer`, to avoid affecting bundle size too much.
For third-party users that access the `AnnotationLayer`, as exposed in the main PDF.js library, they'll now need to *manually* register it for translation. (However, the *standalone* viewer-components still works given the point above.)
Use existing helper to calculate the Box
Co-authored-by: Jonas Jenwald <jonas.jenwald@gmail.com>
Ensure that there are non-zero
Co-authored-by: Jonas Jenwald <jonas.jenwald@gmail.com>
Add a reference test for #17147
Given that there's now a bit more asynchronicity in the l10n-initialization in the Firefox PDF Viewer, after PR 17115, try to limit the impact of that by moving it to occur a tiny bit earlier in the default viewer initialization.
In Firefox debug builds, there is an assertion to check that we don't connect
a subelement of an already connected root. Thanks to this assertion, we can see
that the root has already been added to Fluent, hence we don't need to do it
a second time.
We don't need to await anymore on the translation in order to update the
toolbar: it'll be done by Fluent, so we can safely remove the "localized"
event and avoid to wait for it.
*Please note:* This patch contains a couple of micro-optimizations, hence I understand if it's deemed unnecessary.
Move the `AppOptions` initialization into the `Preferences` constructor, since that allows us to remove a couple of function calls, a bit of asynchronicity and one loop that's currently happening in the early stages of the default viewer initialization.
Finally, move the `Preferences` initialization to occur a *tiny* bit earlier since that cannot hurt given that the entire viewer initialization depends on it being available.
Note that CSS-features such as e.g. `flex` didn't exist, or had poor cross-browser support, back when the JavaScript-based solution was initially implemented.
- For the generic viewer we use @fluent/dom and @fluent/bundle
- For the builtin pdf viewer in Firefox, we set a localization url
and then we rely on document.l10n which is a DOMLocalization object.
Currently we're unnecessarily converting data between strings and typed-arrays, when dealing with compressible data, in the `writeStream` function.
Note how we're *first* getting a string-representation of the stream, which involves converting the underlying typed-array into a string, only to immediately convert this back into a typed-array. This seems completely unnecessary, and is easy enough to avoid, and we'll now only do a *single* type-conversion in this function.
The current test fails intermittently only on Windows for unknown
reasons: the code is correct and on Linux it always passes. However, we
have already spent quite a lot of time on this test, so rather than
spending even more time on it I figured we should look at what behavior
the test is trying to check and find an alternative way to do it that
can't trigger this intermittent issue anymore.
This commit changes the test to use a term that only exists once in the
entire document so we cannot accidentally highlight another match
anymore. This doesn't change anything about the behavior that this test
aims to check: we still test searching in the XFA layer, we still test
that the original term is matched case-insensitively and we still test
that that match is actually highlighted. Note that the only objective of
the test is confirming that the search functionality covers the XFA
layer, so the exact phrase/match is not the interesting bit.
Given that we only use standard `import`/`export` statements now, after recent PRs, the "exports" global is unused.
Instead we add "__non_webpack_import__" to the `globals` to avoid having to sprinkle disable statements throughout the code.
Finally, the way that `globals` are defined has changed in ESLint and we should thus explicitly specify them as "readonly"; please find additional details at https://eslint.org/docs/latest/use/configure/language-options#specifying-globals
The previous change that set the timeout had effect because we have seen
quite a few protocol timeouts now correctly being raised in the context
of the active test, however we have also still seen a handful of cases
where this wasn't the case and the one second difference turned out to
be too low (likely because the operation was started slightly after one
second into the test run). We therefore tweak the value to be 75% of the
Jasmine timeout. This should be enough to catch operations that happen
later on in the test run, and if a single operation takes that long any
hope for success is already gone anyway.
It's not necessary because we have configured silent printing for
Firefox and Chrome in the browser arguments we pass in `test.mjs`. This
means that the print dialog is not even shown at all or disappears
automatically once printing is done, so the Escape key press serves no
purpose. Since it has been shown to time out, likely because the page
loses focus during printing, and because the page itself doesn't know
when the printing dialog is shown and we therefore can't possibly do the
key press at the right time anyway, this commit gets rid of it to
stabilize the test.
Those files only contain old debugging code that is not used/imported
anywhere anymore, which is generating code scanning alerts. Moreover,
they rely on globals/platform-specific code and don't import/export
logic properly.
Given the amount of work put into removing `require`-calls from the code-base, let's ensure that new ones aren't accidentally added in the future.
Note that we still have a couple of files where `require` is being used, in particular:
- The Node.js examples, however those will be updated to use `import` in PR 17081.
- The Webpack examples, and related support files, however I unfortunately don't know enough about Webpack to be able to update those. (Hopefully users of that code will help out here, once version `4` is released.)
- The `statcmp`-tool, since *some* of those `require`-calls cannot be converted to `import` without other code changes (and that file is only used during benchmarking).
Please find additional details at https://github.com/import-js/eslint-plugin-import/blob/main/docs/rules/no-commonjs.md
This *finally* allows us to mark the entire PDF.js library as a "module", which should thus conclude the (multi-year) effort to re-factor and improve how we import files/resources in the code-base.
This also means that the `gulp ci-test` target, which is what's run in GitHub Actions, now uses JavaScript modules since that's supported in modern Node.js versions.
For large/complex images it's possible that the image-data arrives in the API *after* the page has been scrolled out-of-view and thus been cleaned-up. In this case we obviously shouldn't cache such page-level data, since it'll first of all be unused and secondly can increase memory usage *a lot*.
Also, ensure that we *immediately* release any `ImageBitmap` data in this case to help reclaim memory faster.
The examples themselves were updated to account for JavaScript modules, which didn't require changing the actual URLs.
However, since it seems that JSFiddle doesn't support JavaScript modules in its separate "JavaScript" editing-area we need to change how we embed the examples to avoid showing a blank "JavaScript"-tab.
When pdfBug is true, the substitution font is used in the text layer in order
to be able to know what is the font really used thanks to the devtools.
And to be sure that fonts are loaded, the font cache isn't cleaned up when
the debugger is active.
The default protocol timeout is 180 seconds according to the
documentation at https://pptr.dev/api/puppeteer.browserconnectoptions,
but the Jasmine timeout we configure in the individual boot files is 30
seconds. The consequence of this is that if a protocol (CDP) error
occurs after 30 seconds Jasmine will fail the test, but the actual
protocol error from Puppeteer is raised much later in the context of
another test, which causes unrelated failures or tracebacks.
This commit fixes the problem by configuring Puppeteer to always use a
lower protocol timeout than the Jasmine timeout so that protocol errors
are always raised in the context of the test that actually triggered it.
It's been loaded as a JavaScript module for a long time, and given that the file is bundled as-is (without building) it seems reasonable to just change the file extension now.
The Windows bot is usually slower than the Linux bot, and therefore
text layer rendering is as well. However, the `autoprint` test awaited
text layer rendering to complete before activating the selector check,
which makes it timing-sensitive and causes it to never resolve because
the page is already printed (and the printed page div removed) by then.
This commit should fix the issue by activating the selector check as
soon as possible, namely as soon as the viewer appears, which should
ensure we're always registering the selector check in time because we're
doing it even before rendering is starting.
Comparing the currently supported browsers/environments, see [the FAQ](https://github.com/mozilla/pdf.js/wiki/Frequently-Asked-Questions#faq-support) and the [MDN compatibility data](https://developer.mozilla.org/en-US/docs/Web/API/structuredClone#browser_compatibility), the `structuredClone` polyfill is *only* needed in Google Chrome versions < 98. Because of some limitations in the core-js polyfill we're currently forced to special-case the `transfer` handling to prevent bugs, and it'd be nice to avoid that.
Note that `structuredClone`, with transfers, is only used in two spots:
- The `LoopbackPort` class, which is only used with fake workers. Given that fake workers should *never* be used in browsers, breaking that edge-case in older Google Chrome versions seem fine.
- The `AnnotationStorage` class, when Stamp-annotations have been added to the document. Given that Google Chrome isn't the main focus of development, breaking *part* of the editing-functionality in older Google Chrome versions should hopefully be acceptable.
To avoid problems with `export` statements in the QuickJS Javascript Engine, we can work-around that by *explicitly* exposing `pdfjsScripting` globally instead.
At this point in time all browsers, and also Node.js, support standard `import`/`export` statements and we can now finally consider outputting modern JavaScript modules in the builds.[1]
In order for this to work we can *only* use proper `import`/`export` statements throughout the main code-base, and (as expected) our Node.js support made this much more complicated since both the official builds and the GitHub Actions-based tests must keep working.[2]
One remaining issue is that the `pdf.scripting.js` file cannot be built as a JavaScript module, since doing so breaks PDF scripting.
Note that my initial goal was to try and split these changes into a couple of commits, however that unfortunately didn't really work since it turned out to be difficult for smaller patches to work correctly and pass (all) tests that way.[3]
This is a classic case of every change requiring a couple of other changes, with each of those changes requiring further changes in turn and the size/scope quickly increasing as a result.
One possible "issue" with these changes is that we'll now only output JavaScript modules in the builds, which could perhaps be a problem with older tools. However it unfortunately seems far too complicated/time-consuming for us to attempt to support both the old and modern module formats, hence the alternative would be to do "nothing" here and just keep our "old" builds.[4]
---
[1] The final blocker was module support in workers in Firefox, which was implemented in Firefox 114; please see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Statements/import#browser_compatibility
[2] It's probably possible to further improve/simplify especially the Node.js-specific code, but it does appear to work as-is.
[3] Having partially "broken" patches, that fail tests, as part of the commit history is *really not* a good idea in general.
[4] Outputting JavaScript modules was first requested almost five years ago, see issue 10317, and nowadays there *should* be much better support for JavaScript modules in various tools.
The user should *always* provide a correct `GlobalWorkerOptions.workerSrc` value when using the PDF.js library in browser environments. Note that the fallback:
- Has been deprecated ever since PR 11418, first released in version `2.4.456` over three years ago.
- Was always a best-effort solution, with no guarantees that it'd actually work correctly.
- With upcoming changes, w.r.t. outputting JavaScript modules, it'd now be more diffiult to determine the correct value.
The minified default viewer has never been distributed in either official releases or through pdfjs-dist, which means that it's most likely unused, and it's has never been tested nor actively maintained.
Setting the alpha-value explicitly to `1` in `rgb` colors is unnecessary, since that's the default value, and this way we ever so slightly reduce the size of our CSS files.
Unfortunately I've not found a Stylelint rule to enforce this automatically, and the patch was generated using search and replace.
When an editing button is disabled, focused and the user press Enter (or space), an
editor is automatically added at the center of the current page.
Next creations can be done in using the same keys within the focused page.
When an element has the hasOwnCanvas flag we must have an HTML container to attach
the canvas where the element will be rendered.
So the noHTML flag must take this information into account:
- in some cases the noHTML flag is resetted depending on the hasOwnCanvas value;
- in some others, the hasOwnCanvas flag is set depending on the value of noHTML.
To reduced the risk of regressing something else, given that the issue only applies to a (for the default viewer) non-default configuration, this patch is purposely limited to only TextWidget-annotations in the display layer.
It happens only on windows with chrome.
For any reason, click event isn't correctly triggered and it seems work correctly
with pointerup.
And it seems that when drawing a svg on an OffscreenCanvas we need to wait
a little in order to be able to transfer it: it's why this patch adds
a check on the canvas content.
This has been deprecated since version `2.15.349`, which is a year ago.
Removing this will also simplify some upcoming changes, specifically outputting of JavaScript modules in the builds.
This removes the only remaining old and non-standard handling of exports in the `web/`-folder, since some initial attempts at outputting JavaScript modules in the builds have identified this file as a potential problem.
While this uses a hard-coded list, for overall simplicity, I don't believe that that's a big problem since:
- Generating this file automatically would require a bunch more parsing *every single time* that the library is built.
- The official API-surface doesn't change often enough for this to really impede development in any significant way.
- The added unit-test helps ensure that this list cannot accidentally become outdated.
When the editor is invisible (because on a non-rendered page) its parent is null.
But when we undo its deletion, we need to have a parent to attach it.
Given that this is accessed multiple times per page in the viewer, that leads to a number of (strictly speaking unneeded) function calls and allocated Objects for each invocation. By converting `layerProperties` to a, lazily initialized, Object we can avoid this.
When PR 17015 removed the `disabled` handling for the "Save"-button it left a bunch of now unused CSS rules behind, which seems like a simply oversight.
Rather than shipping "dead" CSS rules, let's remove those until such a time that they're actually needed.
The old pre-processor used for CSS, and HTML, files leaves comments intact which unnecessarily contributes to the overall size of the *built* CSS files (note that the built JavaScript files don't include comments).
Rather than trying to "hack" comment removal into the pre-processor it seems easier to use a PostCSS plugin instead. The one potential issue is that it also affects *some* whitespaces, and it's not clear to me if this'll work with the various CSS-related tests that run in mozilla-central.
Please refer to https://www.npmjs.com/package/postcss-discard-comments for additional information.
*For many non-English locales the translated strings will be longer, which is easy to forget about during development/review.*
Note how for some locales (e.g. Swedish) the altText-button end up looking horizontally "cramped", hence it seems reasonable to add a bit of inline padding to improve this.
In the scripting integration tests we use a few different typing
delays, mostly 100 or 200 milliseconds. According to for example
https://www.typingpal.com/en/documentation/school-edition/pedagogical-resources/typing-speed,
a fast typing speed is around 300 characters per minute, which is 5
characters per second and therefore a delay of 200 milliseconds between
each keystroke. Note that this is already above average, so in practice
the delay will be even larger. Therefore the 100 milliseconds variant
is unrealistically fast and therefore not suitable for the integration
tests which aim to simulate the average user behavior.
On top of that, the quick typing speeds are problematic for the tests
that involve validation alert dialogs appearing during typing. In those
tests a handler is registered to close the dialog once it pops up, but
it takes time for Puppeteer to notice the dialog, trigger the handler
and close it. If the typing delay, which is the delay between the key
down and key up events according to the Puppeteer source code at
https://github.com/puppeteer/puppeteer/blob/master/packages/puppeteer-core/src/cdp/Input.ts#L209-L215,
is too short, the key up event will be fired before the dialog is
closed. In that time the text box we're typing in is not focused, so
when the dialog is closed the `page.type()` call on the text box will
never resolve because the key up event never reached the text box.
This commit aims to fix the issues by converting all 100 millisecond
delays to 200 milliseconds. For instance the "must check input for US
zip format" failed pretty consistently locally before and hasn't failed
anymore with a 200 millisecond delay.
This integration test fails often because we wait for scripting to be
ready before we check the printed page, but most of the time the PDF
is already done printing before scripting is reported to be ready.
This happens because the print trigger is on the `Open` event, which is
one of the first events to be dispatched and, most notably, before
scripting is marked as ready; please see
https://github.com/mozilla/pdf.js/blob/master/web/pdf_scripting_manager.js#L176-L191.
Given that the PDF document is only one page, printing it is usually
finished between triggering the `Open` event and scripting reported
to be ready. If this happens the printed page is already destroyed
before we get to our actual test, which will then timeout because it
will never find the printed page in the DOM.
This commit fixes the problem by not awaiting scripting to be ready
because the fact that the printed page appears is already enough to know
that autoprint was triggered (after all, there is no other user
interaction involved here). While we're here we also switch to the
shorter `page.waitForSelector` function.
but keep it for the text area.
Disable pointerdown on the alt-text button to disable dragging the editor
when the button is clicked (especially when slightly moving the mouse
between the down and the up).
The dialog element handles closing with <kbd>Esc</kbd> automatically, however we're not reporting telemetry in that case.
In order to fix that the easiest solution, as far as I'm concerned, seem to be moving the telemetry reporting into the dialog-close handler since it's always invoked.
This patch addresses an edge-case that'll probably never happen, but it nonetheless seems like something that we want to fix.
Note how we're using the `#currentEditor`-field to prevent opening the dialog when it's already active, and it being reset once the dialog has been closed.
By also resetting the `#currentEditor`-field during destruction, instead of waiting until the dialog has actually closed (assuming it's currently open), there's a *tiny* window of time[1] during which we could theoretically allow to (incorrectly) re-open the dialog and thus getting out-of-sync state in the viewer-component.
---
[1] Since the "close" event, on a dialog-element, is dispatched asynchronously by the browser.
When the user edit an existing alt-text and remove it, we want to be able
to save this state and consequently remove the done state from the
alt-text button.
Remove the button from its parent when the editor is removed: it should
help to save few Kb of memory.
Radio-buttons can also be toggled by clicking on their associated `label`-elements, and not only the `input`-elements itself, however it seems that "pointerdown" event listeners don't cover that case.
Hence it's possible that telemetry could miss certain cases of a mouse being used, and the easiest solution seem to be to instead use "click" event listeners and just ignore keyboard-based events.
Rather than trying to be "clever" here, and possibly affect code readability negatively, let's just restore the `collectFields` parameter to address the unneeded parsing that now happens when printing new Annotations.
Given the limitations of the old pre-processor that's used for CSS/HTML files, this unfortunately isn't as "easy" to implement as it is for JavaScript code.
Since this is the first case where we've wanted to do conditional CSS imports, rather than trying to completely re-write the pre-processor, this patch settles for handling it explicitly in the `expandCssImports` function.
Looking at the save-telemetry values they're all boolean *except* for "alt_text_edit" in one instance, since `this.#previousAltText` may be an empty string (looking at the `editAltText` method) and this value may thus become an empty string as well.
When closing a document in the viewer, e.g. by running `PDFViewerApplication.close()` in the console, the `AltTextManager.#finish` method currently throws *unless* the `altText` dialog is actually open.
Similar to e.g. the PasswordPrompt, we should thus only attempt to close the `altText` dialog when it's open.
In the rare situation that an optional content dictionary lacks a /Type-entry we currently throw, which may prevent e.g. Form XObjects from rendering completely.
Fixes https://bugs.ghostscript.com/show_bug.cgi?id=707147
It's a part of the UX specifications. There's a drawing issue in Firefox
(see bug https://bugzilla.mozilla.org/1853288) but setting the
background-clip property to content-box seems to be a good workaround.
Especially on slower bots there is some time between clicking the
element and the actual visibility change, but we didn't await this and
checked the visibility state immediately after clicking. This can be
reproduced 100% of the time by introducing a delay in the `display` and
`hidden` handlers of the `_commonActions` shadow call.
This commit fixes the problem by waiting until the first visibility
change actually happened before continuing with the assertions.
This integration test currently fails intermittently on the bots because
of the fixed timeout in the test, which is sometimes too low on slower
systems. The issue can be reproduced 100% of the time by introducing a
delay just before dispatching the `switchannotationeditormode` event.
Puppeteer also discourages this and instead recommends waiting for a
selector instead, which we now do here. This ensures that the test only
continues if the element under test is available and therefore prevents
any timing problems.
The x/y-coordinates are floats instead of integers like one might
expect. The current approach rounds both the old and the new
coordinates in order to do integer comparison. However, rounding each
coordinate individually causes too much loss of precision because,
depending on the decimal value, they are either rounded up or down
which causes intermittent off-by-one errors.
This commit fixes the problem by comparing coordinate differences
instead of the coordinates themselves. The precision loss is avoided
by subtracting the old from the new coordinate as-is and only rounding
the final result.
This integration test currently fails intermittently on the bots because
of the fixed timeout in the test, which is sometimes too low on slower
systems. The issue can be reproduced 100% of the time by introducing a
delay in the `WidgetAnnotationElement.showElementAndHideCanvas` method.
Puppeteer also discourages this and instead recommends waiting for a
selector instead, which we now do here. This ensures that the test only
continues if the element under test is available and therefore prevents
any timing problems.
We already use `page.$eval` in most other integration tests and it's
simpler because it already takes the selector as argument, so we don't
have to do a separate `querySelector` call ourselves.
When there is no tree, the tags for the new annotions are just put under the root element.
When there is a tree, we insert the new tags at the right place in using the value
of structTreeParentId (added in PR #16916).
Now that modern JavaScript is fully supported also in the worker-thread we no longer need to keep old closures, which slightly reduces the size of the code.
Given that this is a shadowed getter, the `opMap` is already lazily initialized and it shouldn't be necessary to *also* use the `getLookupTableFactory` helper function here. Looking at the history of the code, it seems that this is simply a leftover from before JavaScript classes existed.
Now that modern JavaScript is fully supported also in the worker-thread we no longer need to keep old closures, which slightly reduces the size of the code.
Now that modern JavaScript is fully supported also in the worker-thread we no longer need to keep old closures, which slightly reduces the size of the code.
Now that modern JavaScript is fully supported also in the worker-thread we no longer need to keep old closures, which slightly reduces the size of the code.
Now that modern JavaScript is fully supported also in the worker-thread we no longer need to keep old closures, which slightly reduces the size of the code.
While this cache will not contain a huge amount of data in practice, it's nonetheless a *global* cache that currently will never be cleared.
This patch also removes the existing closure, since it shouldn't really be necessary nowadays given that the code is a JavaScript module which means that only explicitly listed properties will be exported.
When I started looking at PR 16938 it occurred to me that some of the new structTree-methods are synchronously accessing certain dictionary-data (not used during "normal" structTree-parsing), which may not be generally safe since everything in a dictionary could be a reference (and the relevant data may not have been loaded yet).
Rather than suggesting that we make all those new methods even more asynchronous, to me the overall simplest and safest solution is to ensure that the *entire* PDF document has been loaded *before* we begin saving it. In practice this shouldn't really affect "performance" of saving noticeably, since it's always depended on the entire PDF document being downloaded.
Finally note that with the exception of the PDF document possibly not having been fully downloaded when saving is triggered, all other "global" document properties are pretty much guaranteed to already be available at this point.
The unit test is re-enabled because it no longer seems to fail after 10
runs on Linux where this used to fail often. Code inspection also shows
that the code is correct and should raise the previous exception
(anymore). Finally, a lot has changed since this test was disabled such
as new Jasmine versions, new Linux bot OS version and new browser
versions.
While it makes sense to check that the `destDict` parameter is indeed a Dictionary, since that data comes from the PDF document itself, the `resultObj` parameter is an internal PDF.js implementation detail that should always be correct (or tests will fail).
Over time the amount of "document level" data potentially needed during parsing of Annotations have increased a fair bit, which means that we currently need to ensure that a bunch of data is available for each individual Annotation.
Given that this data is "constant" for a PDF document we can instead create (and cache) it lazily, only when needed, *before* starting to parse the Annotations on a page. This way the parsing of individual Annotations should become slightly less asynchronous, which really cannot hurt.
An additional benefit of these changes is that we can reduce the number of parameters that need to be explicitly passed around in the annotation-code, which helps overall readability in my opinion.
One potential drawback of these changes is that the `AnnotationFactory.create` method no longer handles "everything" on its own, however given how few call-sites there are I don't think that's too much of a problem.
The classes were stripped out during when creating the field name but
it led to a wrong name.
Since class components in a path are irrelevant, they're just ignored
when searching for a node in the datasets.
Focus callback must be called only when the element has been blurred.
For example, blur callback (which implies some potential validation) is not called
because the newly focused element is an other tab, an alert dialog, ... so consequently
the focus callback mustn't be called when the element gets its focus back.
While reviewing PR 16898 it occurred to me that it's currently impossible to trigger downloading of FileAttachment annotations using the keyboard.
Hence this patch adds `Ctrl + Enter` as the keyboard shortcut to download those, thus supplementing the existing double-clicking when using a mouse.
The goal is to always have something which is focusable to let the user select
it with the keyboard.
It fixes the mentioned bug because, the annotation layer will now have a container
to attach the canvas for annotations having their own canvas.
`.grab-to-pan-grab:active` is `#viewerContainer` when the mouse is
pressed down. It is supposed to have a `cursor: grabbing` appearance
immediately on mousedown,
`.grab-to-pan-grabbing` is the overlay that is supposed to cover
everything, and also has the `cursor: grabbing` appearance. The "cover
everything" result is achieved through `position:fixed`, `inset:0`, etc.
The block with these CSS properties for "cover everything" is currently
shared by `.grab-to-pan-grab:active` and `.grab-to-pan-grabbing`, but
only "cursor" need to be shared. The original JS and CSS code at
https://github.com/Rob--W/grab-to-pan.js shows that these were supposed
to be associated with the overlay only.
The PR that added this to PDF.js also shows that the "cover everything"
CSS properties were supposed to be limited to the overlay only:
https://github.com/mozilla/pdf.js/pull/4209#discussion-diff-9285917
But the final version of the PR mistakenly merged them together.
This patch rectifies that mistake.
The typescript compiler is now configured to know about the import map
to be able to resolve those imports and find the associated types.
As tsc outputs declaration files using the original module identifiers
and not the resolved ones, tsc-alias is used to post-process the
declaration files by resolving those paths.
Some configurations settings like `paths` cannot be provided through CLI
arguments but only in a configuration file. And when using a
configuration file, only a few options (like `--outDir` can still be
provided) through the CLI.
Using `removeNullCharacters` on the URL should be completely redundant, given the kind of data that we're passing to the `addLinkAttributes` helper function. Note that whenever we're handling a URL, originating in the worker-thread, in the viewer that helper function is always being used.
Furthermore, on the worker-thread all URLs are parsed with the `createValidAbsoluteUrl` helper function, which uses `new URL()` to ensure that a valid URL is obtained. Note that the `URL` constructor will either throw, or in some cases just ignore them, when encountering `\u0000`-characters during parsing.
Hence it should be *impossible* for a valid URL to contain `\u0000`-characters and we can thus simplify the viewer-code a tiny bit. The use of `removeNullCharacters` is most likely a left-over from back when `new URL()` wasn't generally available in browsers.
Testing the `tagged_stamp.pdf` document locally in the viewer, I noticed that e.g. the /Alt entry for the StampAnnotation contains "Secondary text for stamp\u0000".
Elsewhere in the viewer we're skipping null-chars and it's easy enough to do that in the `StructTreeLayerBuilder` class as well. (Note that we generally let the API itself return the data as-is.)
This fixes invalid type references (either due to invalid paths for the
import or missing imports) in the JS doc, as well as some missing or
invalid parameter names for @param annotations.
The issue described in the mentioned bug is reall because
Acrobat is rendering the XFA instead of the Acroform.
The original patch just tried to workaround the issue but it
induces some regressions.
This was added in PR 14899, over a year ago, however it's still completely unused in the PDF.js library/viewer. In hindsight I think that it was a mistake to add unused functionality, and the issue should probably have been WONTFIXed instead, however we probably can't just remove it now.
Thanks to the pre-processor, we can at least exclude this code in the *built-in* Firefox PDF Viewer.
After the changes in PR 16828 the `StampEditor` can now be initialized with a File, in addition to a URL, hence it seems that the `isEmpty` method ought to take that property into account as well.
Looking at this I also noticed that the assignment in the constructor may cause the `this.#bitmapUrl`/`this.#bitmapFile` fields be `undefined` which "breaks" the comparisons in the `isEmpty` method.
We could obviously fix those specific cases, but it seemed overall safer (with future changes) to just update the `isEmpty` method to be less sensitive to exactly how these fields are initialized and reset.
Currently we're repeating virtually the same code *four times* when fetching the bitmap-data, which seems unnecessary.
Also, ensure that the `#bitmapPromise` is always `null`ed by moving that into the `StampEditor.#getBitmapDone` method.
Currently this unit-test will pass just fine if compression is disabled, e.g. by commenting out the relevant code in the `src/core/writer.js` file.
While we don't have a simple way of *directly* checking that the Annotation text-content is compressed, we can however use the resulting file-size as a fairly good proxy. (Note that if compression is disabled the file-size is more than doubled.)
Please note that for performance reasons it's not really advised to use the same worker-thread *in parallel* for parsing multiple PDF documents, since they will then unnecessarily compete for resources.
However, given that it's still possible to do that e.g. when using the global `workerPort` it probably won't hurt to add a unit-test for this particular situation.
Given that the `PDFDocumentLoadingTask.destroy()`-method is documented as being asynchronous, you thus need to await its completion before attempting to load a new PDF document when using the global `workerPort`.
If you don't await destruction as intended then a new `getDocument`-call can remain pending indefinitely, without any kind of indication of the problem, as shown in the issue.
In order to improve the current situation, without unnecessarily complicating the API-implementation, we'll now throw during the `getDocument`-call if the global `workerPort` is in the process of being destroyed.
This part of the code-base has apparently never been covered by any tests, hence the patch adds unit-tests for both the *correct* usage (awaiting destruction) as well as the specific case outlined in the issue.
- The `src/core/unicode.js` exclude ought to have become unnecessary already with PR 16200, which significantly shortened and simplified that file.
- The `src/core/glyphlist.js` exclude no longer seems necessary in practice either, possibly because of improvements in Babel.
The main stamp button will be used to just enter in a add/edit image mode:
- the user can add a new image in using the new button.
- the user can edit an image in resizing, moving it.
In image mode, when the user clicks outside on the page but not on an editor,
then all the selected editors will be unselected.
After the `src/core/`-changes in PR 16779 the `PDFDocumentProxy.getJSActions` method should no longer be able to return *empty* entries, which means that we can simplify the "JavaScript support is not enabled"-warning in the viewer.
Furthermore, improve the auto-printing hack used when scripting is disabled.
Given that the FieldObjects are parsed in parallel, in combination with the existing caching in the `getPage`-method and `annotations`-getter, adding additional caches for this fallback code-path doesn't seem entirely necessary.
We're adding the action in the undo/redo stack whatever the status of the
operation was. This patch aims to add the action only when the image has been
successfully added.
When several editors are selected and the window loses and then gets back its focus,
the previously focused editor is triggering its focus callback making it the only
selected one.
This patch aims to avoid triggering the focus callback called when the main window
gets its focus back.
When moving an element in the DOM, the focus is potentially lost, so we need to make sure
that the focused element before the translation will get back its focus after it.
But we must take care to not execute any focus/blur callbacks because the user didn't
do anything which should trigger such events: it's a detail of implementation. For example,
when several editors are selected and moved, then at the end the same must be selected, so
no element receive a focus event which will set it as selected.
There are 2 rotation we've to deal with: the viewer one and the editor one.
The previous implementation was a bit complex and having to deal with these
rotation would have potentially increase it.
So this patch aims to simplify the implementation and deal with all the possible
cases.
The main idea is to transform the mouse deltas according to the rotations and then
apply the resizing in the page coordinates system.
When resizing an editor we're currently using unidirectional cursors, please refer to https://developer.mozilla.org/en-US/docs/Web/CSS/cursor
Given that editors can (generally) be resized to become either smaller or larger, it seems overall more appropriate to use bidirectional cursors to make this clearer to the user.
Note that as mentioned in the MDN article some environments, which seems to apply to e.g. Windows 11, doesn't differentiate between the two cursor formats and simply use bidirectional ones unconditionally.
One additional benefit of these changes is that the relevant CSS rules become slightly more compact.
We obviously don't want to re-introduce any `require` usage in e.g. the viewer, since we should strive to only use native `import` statements wherever possible.[1]
Hopefully exposing e.g. the library globally in more cases won't break anything, however it's somewhat difficult for me to imagine all the ways in which third-party users may be accessing the PDF.js library. (Given the lack of a runnable test-case in the issue, I also cannot guarantee that this is enough to fully address the problem.)
---
[1] Ideally we should probably not rely on e.g. `pdfjsLib` being globally available in the *built* viewer, and rather always `import` the library instead.
Unfortunately this would require larger (possibly breaking) changes in the builds that we provide, however note that Firefox only recently got support for `import` in workers and that Webpack still only have *experimental* support for outputting "proper" modules.
This method is very old, however with the exception of the auto-print hack (when scripting is disabled) in the viewer it's never actually been used.
Most likely the idea with `PDFDocumentProxy.getJavaScript` was that it'd be useful if scripting support was added, however it turned out that it was a bit too simplistic and instead a number of new methods were added for the scripting use-cases.
Without this patch the password dialog is pretty difficult to use in the GeckoView-viewer, because of a number of missing CSS variables.
*Please note:* This patch makes no effort at actually styling the dialog to better suite the overall look of the GeckoView-viewer, but focuses solely on making it actually usable (since password protected PDF documents are somewhat rare).
If the current PDF document is closed while the password dialog is open, e.g. manually by calling `PDFViewerApplication.close()` from the console, the password dialog wouldn't be closed as intended.
*Please note:* This could only affect the GENERIC viewer, although it's very unlikely to ever happen, since that's the only one that supports opening more than one PDF document.
*Please note:* This situation should never happen in practice, but it nonetheless cannot hurt to fix this.
If the `PasswordPrompt.open` method would ever be called synchronously back-to-back *and* if opening of the dialog fails the first time, then the second invocation would remain pending indefinitely since we just clear out the capability.
Given that the `useOnlyCssZoom` option is essentially just a special-case of the `maxCanvasPixels` functionality, we can combine the two options in order to simplify the overall implementation.
Note that the `useOnlyCssZoom` functionality was only ever used, by default, in the PDF Viewer for the B2G/FirefoxOS project (which was abandoned years ago).
When searching for "endobj"-operators, make sure that we don't accidentally match a "trailer"-string in /Content-streams without /Filter-entries (i.e. streams that contain "raw" and thus human-readable data).
Currently we accidentally accept `cMapUrl` and `standardFontDataUrl` parameters that are empty strings or `null`, since e.g. `new URL(null, document.baseURI)` doesn't throw, when validating the `useWorkerFetch` parameter via the `isValidFetchUrl` helper function.
Please note that we are currently failing gracefully in this case, as intended, however the warning-messages printed in the console are perhaps less helpful without this patch.
When an editor is selected in using the keyboard then it has the focus.
But then if the editor is unselected with Escape key then the focus must
be removed otherwise we still have a blue outline around it.
And add few missing timeout in the integration tests.
This is quite old code, however the error-handling no longer seems necessary for a couple of reasons:
- The `PDFViewerApplication.open` method is asynchronous, which means that it cannot throw a "raw" `Error` and the try-catch is not needed in that case.
- None of the other affected methods should throw, and if they do that'd rather indicate an *implementation* error in the code.
- Finally, and most importantly, with the `PDFViewerApplication.run` method now being asynchronous an (unlikely) `Error` thrown within it will lead to a rejected `Promise` and not affect execution of other code.
We can use modern JavaScript features, in this case optional chaining, to (ever so slightly) simplify how `ViewHistory` errors are handled.
Also, use arrow functions when handling a few other (very rare) errors during loading since that's a tiny bit shorter.
The way that the callback-methods are specified feels unnecessarily verbose, however we can introduce a short-hand to improve this.
Also, adds a couple of new-lines to improve overall readability.
Selected editors can be moved in using the arrows:
- up/down/left/right will move the editors of 1 in page unit;
- ctrl (or meta)+up/down/left/right will move them of 10 in page unit.
The keyboard shortcuts (copy, paste, ...) didn't work correctly when the
main container was not focused.
This patch adds few waitForTimeout in the integration test for FreeText
in order to avoid possible intermittent failures.
Given that the `debugger` is loaded as a module we can use "top level await" in development mode to access the necessary API-functionality, which removes the need to manually pass in the required properties.
- it'll improve the way to resize images: diagonally (in keeping ratio between dimensions)
or horizontally/vertically.
- the resizer was almost invisible in HCM.
- make a resize undoable/redoable.
In order to reproduce the original issue:
- switch to freetext mode
- add a text somewhere
- double click outside and add some text
- repeat the previous step several times
no text is selected during the edition.
The existing Node.js-specific polyfills depend on the `node-canvas` package, which has unfortunately (repeatedly) shown to cause trouble for many users. We attempted to improve the situation by listing the relevant packages as `optionalDependencies`, but that didn't seem to really fix the problem.
With this patch the library should be able to load in Node.js-environments even if polyfilling fails, and any errors will instead occur during rendering. Obviously this is *not* a proper solution, since it basically moves the problem to another part of the code-base.
However for certain "simpler" use-cases, such as e.g. text-extraction, these changes should hopefully improve general usability of the PDF.js library in Node.js-environments.
*Please note:* For most PDF documents rendering should still work though, since `DOMMatrix` is *currently* only used with Patterns and `Path2D` only with Type3-fonts and Patterns.
In Gulp 4, which we use for years now, the `gulp.src()` function
supports the `removeBOM` option to disable the default BOM stripping,
so this commit uses that to get rid of our `vinyl-fs` dependency.
Note that this actually makes disabling BOM stripping work again. It's
currently broken because in `vinyl-fs` 3, that we already use since 2018
in commit 95de23e, the `stripBOM` option was renamed to `removeBOM`, so
the current code doesn't actually disable BOM stripping which we now
confirmed and sadly broke for years without anyone noticing. Most likely
this is because the BOM is not required for UTF-8 documents, but while
not necessary it also can't hurt to have it for tools that use it to
determine if a document is UTF-8.
*Please note:* This only removes the preference itself, however both the viewer-option and the actual implementation is still available.
The `useOnlyCssZoom` functionality was only ever used, by default, in the PDF Viewer for the B2G/FirefoxOS project (which was abandoned years ago). Given that CSS-only zooming can easily make the document look blurry even at low zoom levels, this functionality was only intended for low-powered mobile devices.
Hence it seems reasonable to remove the `useOnlyCssZoom` preference now, since neither the default viewer nor the GeckoView-specific viewer uses this functionality.
Trying to update Stylelint to version `15.10.1`, and beyond, broke linting. Looking at the changes the issue appears to be that the `bin/stylelint.js` file was replaced with `bin/stylelint.mjs` instead, which our `gulp lint` runner wasn't able to automatically find; see https://github.com/stylelint/stylelint/compare/15.10.0...15.10.1
When the flag is set, the appearance has to be generated from the value so it's
useless/meaningless to extract the content from the existing appearance.
When a pdf has /NeedAppearances set to true, the annotation appearance must be
generated from its value and we must take into account the hasOwnCanvas property.
*Please note:* I'm not aware of any bugs caused by this, however that might be more luck than anything else.
In PR 16392 the `incrementalUpdate` function, and all of its various helpers, were made asynchronous. However the call-site in `src/core/worker.js` wasn't updated, which means that we currently reset temporary XRef-entries while saving is ongoing.
By leveraging import maps we can get rid of *most* of the remaining `require`-calls in the `src/display/`-folder, since we should strive to use modern `import`-statements wherever possible.
The only remaining cases are Node.js-specific dependencies, since those seem very difficult to convert unless we start producing a bundle *specifically* for Node.js environments.
With the changes in the previous patch the `isNodeJS`-helper no longer needs to live in its own file, which helps get rid of a closure in the *built* files.
In the last couple of years we've been quicker to remove support for older browsers/environments, which means that at this point in time we don't bundle that many polyfills. (The polyfills are also generally simpler nowadays, ever since we removed support for e.g. Internet Explorer.)
Rather than having to *manually* handle the polyfills, we can actually let Babel take care of bundling the necessary polyfills for us; please refer to https://babeljs.io/docs/babel-preset-env
The only exception here is the Node.js-specific compatibility-code, which is moved into the `src/display/node_utils.js` file. This ought to be fine since workers are not available/used in Node.js-environments.
*Please note:* For the `legacy`-builds this will increase the size of the *built* files, however that seems like a very small price to pay in order to simplify maintenance of the general PDF.js library.
Localization of this button broke in PR 16340, which I assume was completely accidental, since the download-button now tries to access a l10n-id that was removed some time ago (see PR 15617).
Note how loading even the development viewer, i.e. http://localhost:8888/web/viewer-geckoview.html#locale=en-US, currently logs l10n-warnings on the `master` branch.
Having a `require` in this file has never made sense in e.g. the Firefox PDF Viewer and shouldn't really be necessary.
Possibly the idea was to facilitate some kind of third-party bundling, however the *built* `pdf.js` file has always exposed the API-contents globally.
Currently this class contains a few "special" code-paths for the COMPONENTS build-target, which normally wouldn't be a problem. However, in this particular case that means accessing code that we don't want to include unconditionally in all builds.
This is currently implemented using build-time `require`-calls which we nowadays want to avoid, and we should strive to remove all such cases from the code-base. (Generally speaking `import` is the future, and build-tools may not always play well with a mix of both formats.)
We can easily improve things here by using sub-classing for the COMPONENTS build-target, and then use the ability to re-name when exporting (to avoid breaking existing code).
Occasionally some test-suites may fail to start on the bots, however that's not correctly reflected in the botio-output posted to GitHub which makes it easy to accidentally overlook this situation.
Looking at the raw logs when that happens they always seem to contain a line such as `Run NaN tests` which means that we should be able to easily make this situation a *failure* as intended.
In order to reproduce the issue:
- scale down the image
- zoom the page and the image is pixellated
So this patch allow to redraw the image when zooming.
- Take into account the page translation,
- Take into account the correct translation for the editor border,
- Take into account the position of the first glyph in the annotation,
- Take into account the rotation of the editor.
Close#16633.
There's no good reason for getting this option multiple times in the same method. Also, we can slightly re-factor how the `editorStampButton` is made visible.
- Do the /Filter and /DecodeParms lookup in parallel, since that ought to be a *tiny* bit more efficient.
- Avoid code-duplication when `CompressionStream` isn't supported, since we already have a fallback code-path at the end of the function.
This regressed in PR 16659, when the signature of the `PDFViewer.annotationEditorMode`-setter was changed, and it currently leads to an Error being thrown when exiting PresentationMode.
Unfortunately I wasn't able to come up with a *simple* way to just replace the synchronous `require`-call, since we need to ensure that the default preferences are available when bundling starts.
Hence this patch adds a new intermediate parsing-step in all the relevant gulp-tasks, but this shouldn't affect build-times noticeable since the amount of extra parsing is very small.
*Please note:* It's very possible that there's a better way to handle this, however I figured that unblocking further ESM-work is more important than a "perfect" solution.
createImageBitmap doesn't work with svg files (see bug 1841972), so we need to workaround
this in using an Image.
When printing/saving we must rasterize the image, hence we get the biggest bitmap as image
reference to avoid duplications or poor quality on rendering.
The existing code is unable to *correctly* extract the color from the appearance-stream when the ColorSpace-data is "complex". To reproduce this:
- Open `freetexts.pdf` in the viewer.
- Note the purple color of the "Hello World from Preview" annotation.
- Enable any of the Editors.
- Note how the relevant annotation is now black.
Note how we're accidentally using the wrong operator when trying to parse CMYK colors. I'm not aware of any bugs caused by this, since it seems uncommon in practice for annotations to specify text-colors in CMYK format.
When there was a rotation, the generated bbox was wrong because of an inversion
between width and height.
This patch aims to fix this issue in re-writing the FreeText code generation
to have something similar to what Acrobat does.
And fix the name of the font which wasn't the correct one when calling the
evaluator.
- Update the "Getting the Code" section to specifically mention Mozilla Firefox, since while the development viewer *works* it may look slightly "broken" in Chromium-based browsers. (This is caused by a lack of support for unprefixed CSS properties, e.g. `mask-image`, however this does *not* affect the built PDF.js viewer.)
- Remove the Twitter-link, since that account has not been updated since 2016 (i.e. over seven years ago).
Semantically, it is more correct to encode the fragment in the URL
instead of the URL-encoded `file` query parameter. This shouldn't matter
in practice, because `rewriteUrlClosure` in `chromecom.js` decodes the
`file` parameter and restores the fragment. However, as #16625 shows,
there was a case where this did not work as expected.
`PDFViewerApplication` reads from `location.hash` to initialize
`initialBookmark`. But when extensions/chromium/pdfHandler.js prepares
the redirect URL, the reference fragment is encoded instead of bare.
`rewriteUrlClosure` in `chromecom.js` is responsible for decoding the
URL, but that currently runs too late.
To fix this, update `initialBookmark` after rewriting the URL.
This was not a problem in the past because `rewriteUrlClosure` in
`chromecom.js` executed before the initialization of `initialBookmark`.
Given that the PDF.js library has never officially supported/documented that binary data can be provided as a `Buffer`, and that it's been explicitly deprecated in *four* releases, it seems reasonable that we outright reject such data instead (to reduce the amount of Node.js specific code-paths).
We've now been throwing an Error in *three* releases if the `canvasFactory` option is provided, hence it ought to be fine to stop doing that and simply ignore the option instead.
Rather than having to *manually* determine the potential `transfers` at various spots in the API, we can let the `AnnotationStorage.serializable` getter include this.
To further simplify things, we can also let the `serializable` getter compute and include the `hash`-string as well.
These options are completely unused in the PDF.js viewer, and given that the last update of the `GrabToPan`-code from upstream was in 2016 it shouldn't hurt to remove them.
Before this commit, lint-chromium complained without an obvious course
of action:
> Warning: Pref objects doesn't have the same length.
> Error: chromium/preferences_schema is not in sync
With this commit, the error message is more actionable:
> Warning: extensions/chromium/preferences_schema.json does not contain an entry for pref: enableFloatingToolbar
> Error: chromium/preferences_schema is not in sync
This is something that I completely overlooked during review of PR 16593, since the idea is (obviously) that the viewer-components should be usable as-is without the user needing to manually pass in any *additional* parameters.
To support this we can very easily expose the current `FilterFactory`-instance on the `PDFPageProxy`-class[1], and if needed initialize the highlight-filters when initializing the page (again limited to the viewer-components).
In order to minimize the size the of a saved pdf, we generate only one
image and use a reference in each annotation using it.
When printing, it's slightly different since we have to render each page
independantly but we use the same image within a page.
- Modify the text and background colors in popup to fit a11y requirements
- Add a backdrop filter on clickable areas in using a svg filter mapping
canvas colors to Highlight and HighlightText ones.
It occurred to me that we can actually run this unit-test in Node.js environments by making use of the preprocessor to stub out the browser globals there.
Given that nullish coalescing is now available in all environments/browser that we support, we can (ever so slightly) simplify handling of the `TESTING` build-target.
Until now we've not actually had *any* tests that ensure that the *official* PDF.js-viewer API exposes the intended functionality, which means that things can easily break accidentally.
*Please note:* This unit-test cannot (easily) be run in Node.js-environments, since the `external/webL10n/l10n.js` file contains various browser-specific functionality.
These constants were added "speculatively" in PR 10820, almost four years ago, but have never actually been used. We already have issue 10982 that tracks *potentially* extending support for the affected annotation-format, however until that happens I really don't think that we should keep shipping completely unused code in the PDF.js library.
For the MOZCENTRAL build-target, i.e. the Firefox PDF Viewer, this reduces the total bundle size by 1.1 kilo-byte.
Until now we've not actually had *any* tests that ensure that the *official* PDF.js API exposes the intended functionality, which means that things can easily break accidentally.
- Change (most) fields/methods into private ones, since that's now supported.
- Tweak the constructor-parameters, and simplify the sandbox initialization w.r.t. the viewer components.
- Remove some unused function/method parameters.
- Slightly simplify the "updatefromsandbox"-handler by using local variables and inverting some conditions.
Rather than sprinkling pre-processor statements throughout the viewer-code, simply "disable" the relevant `PDFViewer` setters instead.
Also, given that the GeckoView-specific viewer doesn't have a sidebar we don't actually need to explicitly ignore a `pageMode` during loading.
This helper function was added almost two years ago, in PR 13696, and it still has only a single call-site. Furthermore, with the changes made in PR 16572 it also cannot hurt to reduce the size of the `web/l10n_utils.js` file slightly.
Note how the [`ChromeActions.getPreferences` method](https://searchfox.org/mozilla-central/rev/4e8f62a231e71dc53eb50b6d74afca21d6b254e9/toolkit/components/pdfjs/content/PdfStreamConverter.sys.mjs#497-530) returns the preferences as a string, which we then have to convert back into an Object in the viewer.
Back when that code was originally written it wasn't possible to send Objects from the platform-code, however that's no longer the case and we should be able to (eventually) remove this unnecessary string-parsing now.
*Please note that in order to prevent breakage we'll need to land these changes in stages:*
- Land this patch in mozilla-central, as part of regular the PDF.js updates.
- Change the return type in the `ChromeActions.getPreferences` method, in a mozilla-central patch.
- Remove the string-handling from the `FirefoxPreferences._readFromStorage` method.
Please note that we've never had any functionality in the viewer itself that *set* preferences, and we've thus only ever read them.
For the GENERIC viewer it obviously makes sense for the user to be able to modify preferences, e.g. via the console, but that doesn't really apply to the *built-in* Firefox PDF Viewer since preferences are already accessible via `about:config` there. Hence it does seems somewhat strange to expose, a limited part of, the Firefox preference system in this way when we're not even using it.
Note that the unused preference setting-code also include a fair amount of *additional* validation on the platform-side, such as limiting any possible preference changes to the `pdfjs.`-branch and also an explicit white-list of preference names[1], to make sure that this is safe; please see:
- https://searchfox.org/mozilla-central/rev/4e8f62a231e71dc53eb50b6d74afca21d6b254e9/toolkit/components/pdfjs/content/PdfStreamConverter.sys.mjs#458-495
- https://searchfox.org/mozilla-central/rev/4e8f62a231e71dc53eb50b6d74afca21d6b254e9/toolkit/modules/AsyncPrefs.sys.mjs#21-48
Assuming that this patch lands, I'll follow-up with a mozilla-central patch to remove the code mentioned above.
---
[1] This hard-coded list contains preferences that no longer exist, and also at least one (fairly obvious) typo.
This method was added only for consistency with the `register`-method, however it's never actually been used. To avoid including dead code in the builds, let's just remove the `unregister`-method for now.
*Please note:* If this method ever becomes useful, it'll be trivial to revert this commit.
With the changes in PR 16552 we can now move general translation into the `AnnotationLayer` itself, which should improve things ever so slightly in third-party implementations where the default viewer isn't used.
*This is something that I completely overlooked during review of PR 16552, despite leaving a l10n-related comment.*
The new l10n-handling of PopupAnnotations assume that the `AnnotationLayer` is always initialized with a l10n-instance, which might not actually be the case in third-party implementations where the default viewer isn't used.
To work-around that we'll now bundle, and fallback on, the existing `NullL10n`-implementation in GENERIC builds of the PDF.js library. This will only result in a slight file-size increase for the *built* `pdf.js` file, again limited to GENERIC builds, since the `web/l10n_utils.js` file has no dependencies.
Also, tweaks a couple of TESTING pre-processor checks to *only* include that code when running the reference tests.
- it'll help to be able to move popups on screen to let the user read the text
- popups won't inherit some properties from their parent:
- the popup can be misrendered if for example the parent has a clip-path property.
- add an outline to the popup when the parent is focused.
- hide a popup when it's clicked.
Fix handling of /Filter-entries, since the current implementation could potentially corrupt the data if there's multiple filters present.
Please note that filters are applied *sequentially* during decoding, starting from the first one in the Array, hence the first Array-entry needs to be /FlateDecode in order for things to actually work correctly.
To prevent a future bug, if we want to save more "complex" data such as images, also ensure that we include any existing /DecodeParms-entries when updating the /Filter-entry.
The existing unit-test doesn't work as intended, since the page never actually renders. Note how `cleanup` is *not* allowed to run when parsing and/or rendering is ongoing, however an (old) incorrect condition could prevent rendering from ever starting.
This is very old code, which has been slightly re-factored a couple of times (many years ago), however this doesn't appear to affect e.g. the default viewer since the incorrect behaviour seem highly dependent on "unlucky" timing.
Note also how at the start of the `PDFPageProxy.prototype.render`-method we purposely cancel any pending `cleanup`-call, to prevent unnecessary re-parsing for multiple sequential `render`-calls.
Finally, avoid running `cleanup` when document/page destruction has already started since it's pointless in that case.
After PR 16226 the deprecated SVG back-end is now unused in development mode, with the exception of unit-tests, hence we can re-factor how it's exposed in the API to avoid including a useless webpack-closure in e.g. the *built-in* Firefox PDF Viewer.
Given that this API method isn't used anywhere within the PDF.js library itself, except for the unit-tests, we can avoid including what's effectively dead code in e.g. the *built-in* Firefox PDF Viewer.
- Don't attempt to lookup an "SM" entry, since we're only using "SMask" in the `PDFImage` code and I also cannot find any mention in the PDF specification about that being a valid abbreviation for a Soft Mask entry. (There's only a `SM = Smoothness Tolerance` Graphics State parameter, which is obviously something completely different.)
- Don't lookup the /SMask and /Mask entries unless it's actually an inline image, since it's pointless otherwise.
- Last, but most importantly, only check for the *existence* of /SMask and /Mask entries but don't actually fetch the data. Note that if either one exists it'll contain a Stream, and those cannot be cached on the `XRef`-instance, which leads to unnecessary parsing/allocations and in this case we're not using the actual data for anything.
The original `trimCache` functionality was intended to be exposed on the
top-level `puppeteer` module, but due to a bug in Puppeteer this didn't
work correctly and we had to call `trimCache` on the default Puppeteer
node instance instead, which was fortunately exposed. However, since
this didn't feel like intended API usage, this bug was reported and is
now fixed in Puppeteer 20.5.0, so this commits updates Puppeteer to that
version so we can use the intended API.
The full history of this issue can be found at
https://github.com/puppeteer/puppeteer/issues/10174.
This patch is the result of me going through some old issues regarding non-embedded Wingdings support.
There's a few different things wrong in the referenced PDF document:
- The /BaseFont and /FontName entries don't agree on the name of the fonts, with one font using `/BaseFont /Wingdings-Regular` and `/FontName /wg09np` which obviously makes no sense.
To address this we'll compare the font-names against our lists of known ones and ignore /FontName entries that don't make sense iff the /BaseFont entry is a known font-name.
- The non-embedded Wingdings font also set an incorrect /Encoding, in this case /MacRomanEncoding, which should have been fixed by PR 16465. However this doesn't work since the font has *bogus* font-flags, that fail to categorize the font as Symbolic.
To address this we'll also compare the font-name against the list of known symbol fonts.
As far as I can tell there's no particular reason for initializing `KeyboardManager`-instances eagerly, since the user may never use editing, and we can easily do this lazily instead by utilizing shadowed getters.
While it's slightly difficult to trigger in practice, unless the `defaultZoomDelay`-value is increased, it's currently possible to generate thumbnails from *partially* rendered pages when doing *temporary* CSS-only zooming.
We shouldn't dispatch a "pagerendered"-event when doing *temporary* CSS-only zooming, but simply wait until the actual rendering is done.
While I don't believe that this regression has caused any actual bugs, dispatching *duplicate* events is nonetheless inconsistent and should be fixed.
This patch updates the minimum supported browsers as follows:
- Google Chrome 92, which was released on 2021-07-20; see https://support.google.com/chrome/a/answer/10314655
Note that nowadays we usually try, where feasible and possible, to support browsers that are about two years old. By limiting support to only "recent" browsers we reduce the risk of holding back improvements of the *built-in* Firefox PDF Viewer, and also (significantly) reduce the maintenance/support burden for the PDF.js contributors.
*Please note:* As always, the minimum supported browser version assumes that a `legacy`-build of the PDF.js library is being used; see https://github.com/mozilla/pdf.js/wiki/Frequently-Asked-Questions#faq-support
Given that this functionality is only relevant in third-party use-cases, for example the viewer-components, we can avoid needlessly including it in e.g. the MOZCENTRAL build.
This commit makes the following required changes:
- Replace custom cache trimming logic in favor of the (per our request)
newly added `trimCache` method in Puppeteer. Not only does this greatly
simplify our code and prevents having to import Puppeteer internals,
it's also necessary because Puppeteer 20 removed the `BrowserFetcher`
API in favor of the new separate `@puppeteer/browsers` package.
- Start browsers in series instead of in parallel. Parallel browser
starts broke since Puppetter 19.1.0 and it turns out that it has never
been supported officially, so it worked more-or-less by accident.
Starting browsers in series is the supported way, is almost equally
fast and ensures that we avoid any race conditions during startup.
Finally, it also allows us to remove the `browserPromise` state on our
session objects.
Fixes#15865.
This patch does two things:
- Moves the updating of thumbnails into `web/app.js`, via a new `PDFSidebar` callback-function, to avoid having to include otherwise unnecessary parameters when initializing a `PDFSidebar`-instance.
- Only attempt to generate thumbnail-images from pages that are *cached* in the viewer. Note that only pages that exist in the `PDFPageViewBuffer`-instance can be rendered, hence it's not actually meaningful to check every single page when updating the thumbnails.
For large documents, with thousands of pages, this should be a tiny bit more efficient when e.g. opening the sidebar since we no longer need to check pages that we know have not been rendered.
The way that the cleanup was implemented in PR 12613 has always bothered me slightly, since the `isPageCached`-method that I introduced there always felt quite out-of-place in the `IPDFLinkService`-implementations.
By introducing a new "thumbnailrendered" event, similar to the existing "pagerendered" one, we're able to move the cleanup handling into the `PDFViewer`-class instead.
The way that this was implemented in PR 10217 has always bothered me slightly, since the `isPageVisible`-method that I introduced there always felt quite out-of-place in the `IPDFLinkService`-implementations.
Hence this is instead replaced by a callback-function in `PDFFindController`, to handle the page-visibility checks. Note that since the `PDFViewer`-constructor always sets this callback-function, e.g. the viewer-component examples still work as-is.
Now that font-substitution has been implemented, we should be able to do much a better job at supporting non-embedded Wingdings fonts.
Given that this is a Windows-specific font, see https://en.wikipedia.org/wiki/Wingdings, this is however not guaranteed to work (well) on other platforms.
The affected font is non-embedded ZapfDingbats, however the PDF document for some inexplicable reason specifies the encoding as "WinAnsiEncoding" (which is obviously wrong).
To work-around this bug in the PDF generator, we'll simply ignore any explicitly specified named encoding for non-embedded symbol fonts.
- Remove the dependency on fit-curve;
- Improve the way to draw the current line in using a Path2D and
in clearing only the last part of the curve instead of clearing
all the canvas;
- Smooth the curve when drawing to avoid to have some changes after
the drawing ends;
- Make the smoothing a bit less agressive.
Given that inline images may contain "EI"-sequences in the image-data itself, actually finding the end-of-image operator isn't always straightforward.
Here we extend the implementation from PR 12028 to potentially check all of the following bytes, rather than stopping immediately. While we have fairly decent test-coverage for this code, whenever you're changing it there's unfortunately a slightly higher than normal risk of regressions. (You'd really wish that PDF generators just stop using inline images.)
- if the contours count is lower than -1, the glyph is really likely wrong
so just remove it from the font;
- if a contour has the repeat flag then repeats count mustn't be 0.
Looking at the behaviour in Adobe Reader it doesn't appear that attachments are sorted alphabetically, hence it doesn't seem necessary for us to do so either in the viewer.
An additional benefit of *not* sorting the attachments is that any "actual" attachments are now always placed at the top of the list in the sidebar, and if any `FileAttachment`-annotations exist in the document they will now be appended at the end.
The pdf linked in bug 1135277 contains a lot of stroke instructions.
In using the Firefox profiler, this patch helps to reduce the overall
spent time in this function by 30%.
According to https://en.wikipedia.org/wiki/Impact_(typeface) this font should be available on all current versions of Windows, and with the recently added font-substitution we should actually be able to render it correctly (at least on Windows).
The `fontID` handling is quite old and predates the use of the `idFactory` to generate a unique id for each font, hence we can simplify this code a little bit.
When fixing bug 1766987, I thought the field formatted value came from
the result of the format callback: I was wrong. The format callback is ran
but the value is unused (maybe it's useful to set some global vars... or
it's just a bug in Acrobat). Anyway the value to display is the one rendered
in the AP stream.
The field value setter has been simplified and that fixes issue #16409.
This essentially extends PR 11218 to also apply when looking up the final font-reference, via the XRef-table, fails because the font isn't available.
This patch also changes `PartialEvaluator.fallbackFontDict` to simply use "Helvetica" as the default font-name, since that seems generally reasonable given the now existing font-substitution code.
After PR 12563 we're now free to use optional chaining in the worker-thread as well. (This patch also fixes one previously "missed" case in the `web/` folder.)
For the MOZCENTRAL build-target this patch reduces the total bundle-size by `1.6` kilobytes.
Given that the `css` property isn't constant, since it contains document/font ids, we cannot just check it directly. However, we can make use of regular expressions to ensure that the format is generally correct.
Despite this being a *major* version increase, it doesn't appear to require any updates in our test-suites.
Note in particular that the minimum supported browsers/environments were updated, however this isn't a problem given our recent support-changes in the PDF.js library.
Please find additional details at https://github.com/jasmine/jasmine/blob/main/release_notes/5.0.0.md
Originally the `PDFSidebarResizer` class was slightly larger, since the code used to contain e.g. feature testing for older (and no longer supported) browsers.
Given that there's some amount of overlap, when it comes to what DOM-elements and state that these classes need, it now seems reasonable to simply move the sidebar-resizing into the `PDFSidebar` class.
For the MOZCENTRAL build-target this patch reduces the size of the *built* `web/viewer.js` file by just over `1.1` kilobytes.
Similar to other toolbar/secondaryToolbar buttons that open toolbars or dialogs, it seems reasonable to use "aria-controls" for the editor-toolbar buttons as well.
On my computer, it takes few tenths of a second to load a local font.
Since a font can be used several times in a document, the cache will
improve performances.
- Replace FoxitSans with LiberationSans: LiberationSans is already there (for XFA) and we can use
it as a good replacement of FoxitSans.
- For now we just try to substitue standard fonts, the strategy is the following:
* we try to find a font locally from a hardcoded list;
* if it fails then we use Liberation as fallback (only for Helvetica for the moment);
* else we just fallback on the system serif/sansserif/monospace font.
This patch updates the minimum supported browsers as follows:
- Safari 15.4, which was released on 2022-03-15; see https://en.wikipedia.org/wiki/Safari_version_history#Safari_15
Nowadays we usually we try, where feasible and possible, to support browsers that are about two years old. The reasons for limiting support to a *somewhat* more recent Safari version include:
- Throughout the history of the PDF.js project, Safari has always been the worst browser to attempt to support. Compared to other browsers there's a disproportionate number of bugs affecting Safari, especially on iOS, and in most cases those are browser-specific issues that we simply cannot address.[1]
- Safari has often been a lot slower, compared to other browsers, at implementing new web-platform features. Historically this has sometimes blocked usage of new features, for the benefit of the Firefox PDF Viewer, and it's very often meant having to include and maintain polyfills *only* for Safari.
- The current (minimum) supported Safari version lack enough functionality that polyfills placed in the `src/shared/compatibility.js` file are unfortunately not sufficient, but it also requires a bunch of special-cases in both the `gulpfile` and in the `web/`-code.
- Given that the *built-in* Firefox PDF Viewer is the primary development target for the PDF.js library, and the general development pace these days, we need to limit the maintenance "overhead" caused by other browsers.
---
[1] In a few cases a work-around might be possible, however it'd negatively affect e.g. performance, readability, and/or maintainability of the code.
Originally we only used the `structuredClone` polyfill in the `LoopbackPort`-implementation, and that obviously isn't used anywhere within the various image decoders.
At this point in time we've started to use `structuredClone` a little bit more, hence it seems overall simpler to just bundle the polyfill even in the `legacy`-version of the IMAGE_DECODERS built-target.
For some time these checks have only targeted Node.js environments, since the features in question exist in all supported browsers (even when a `legacy`-build is used).
Now that we've updated the minimum supported Node.js version to 18, a number of polyfills are thus (finally) no longer necessary in that environment. Hence for certain *basic* functionality, such as e.g. text-extraction, it's now possible to use either a modern- or a `legacy`-build of the PDF.js library in Node.js environments.
*Please note:* For e.g. canvas-rendering in Node.js environments it's still necessary to use a `legacy`-build, since that functionality requires various polyfills.
This patch updates the minimum supported environments as follows:
- Node.js 18, which was released on 2022-04-19; see https://en.wikipedia.org/wiki/Node.js#Releases
Note also that Node.js 16 will soon reach EOL, and thus no longer receive any security updates.
The /Decode-implementation in the our JPEG decoder, i.e. `src/core/jpg.js`, seems to only handle *inverting* of images properly. To support arbitrary /Decode-entries correctly we'll always use the `PDFImage.decodeBuffer` method, even for "simple" JPEG images, which should be fine since non-default /Decode-entries aren't a very common occurrence.
*Please note:* This patch will lead to a little bit of movement in some existing test-cases, however it should be virtually imperceivable to the naked eye.
The fallback code-path has never really been used, since the `PDFSidebar` is only used in the default viewer (and has never been exposed in e.g. the COMPONENTS-build).
This patch tries to simplify, and improve, the thumbnail styling:
- For rendered thumbnails there's one less DOM-element per thumbnail, which can't hurt in longer documents.
- Use CSS-variables to set the dimensions of all relevant DOM-elements at once.
- Simplify the visual styling of the thumbnails, e.g. remove the border since the viewer no longer has visible borders around pages, since the relevant CSS-rules are quite old code.
These changes also, at least in my opinion, makes the relevant CSS-rules much easier to understand and work with.
- Make it easier to work on e.g. [bug 1690428](https://bugzilla.mozilla.org/show_bug.cgi?id=1690428) without affecting the other sidebarViews.
This property was added in PR 12726 specifically for use in the `getFontType` function, indirectly used by the `PDFDocumentProxy.stats` getter in the API.
In PR 15880 that functionality was removed, but I forgot to remove this now unused font-property.
Now that we no longer depend on the old Babel version in SystemJS we can remove the `static get ...` work-arounds used to define constants, which leads to slightly more compact code.
Now that https://bugzilla.mozilla.org/show_bug.cgi?id=1247687 has landed in Firefox, we're able to use worker-modules during development :-)
This removes the final piece of SystemJS usage from the PDF.js library, thus allowing a fair bit of clean-up, and we now use *only* native `import`/`export` statements everywhere in development mode.
When the `GlobalImageCache` implementation originally landed, back in PR 11912, the image handling was slightly more complex (with e.g. browser-decoding of some JPEG images). At this point it no longer seems necessary to manually handle pageIndexes in this way, and we should be able to simply inline that in the `GlobalImageCache.shouldCache` method.
This commit migrates this functionality away from the bots. Nowadays
it's possible to build and deploy the website to GitHub Pages directly
through the GitHub Actions, which provides a nice simplication of the
process. Not only does this remove the requirement to have a `gh-pages`
branch in the repository, it also avoids the complexity of having to
configure the workflow to commit to Git branches and allows us to remove
the Git committing code from the Gulpfile.
Note that deploying directly though GitHub Actions workflows needs to be
enabled in the repository settings, but this is easy and well documented
on the link below.
The following resources are relevant for this patch:
- Enabling deployment to GitHub Pages directly through GitHub Actions:
https://docs.github.com/en/pages/getting-started-with-github-pages/configuring-a-publishing-source-for-your-github-pages-site#publishing-with-a-custom-github-actions-workflow
- Uploading GitHub Pages artifacts example:
https://github.com/actions/upload-pages-artifact#usage
- Deploying GitHub Pages artifacts example:
https://github.com/actions/deploy-pages#usage
This patch tries to mimic the look of the message-element in the Firefox browser-findbar, and thus makes the following changes:
- Remove the red colour, since it didn't take the light/dark themes into account.
- Display the "notFound" message in bold.
The `pdfjs-dist/lib/` directory contains a README file that explicitly advises against using those files, however based on a fairly large number of issues filed over the years users seem to be (mostly) overlooking that warning.
In particular it unfortunately seems to be somewhat common for users to attempt to "combine" proper builds from `pdfjs-dist/build/` together with individual components from the `pdfjs-dist/lib/web/` directory, which more often than not leads to subtle bugs and general problems.
When we receive bug reports about this it's often not immediately obvious what the problem is, given that many issues lack enough details (such as runnable test-cases), but after some back-and-forth it usually turns out that usage of `pdfjs-dist/lib/` is the culprit.
Considering that keeping the general PDF.js library working is challenging and time-consuming enough nowadays, this patch thus proposes that we stop including the "lib"-build in the `pdfjs-dist` repository to both reduce user confusion and the support burden.
Currently we only prevent triggering the actual text-extraction multiple times in "parallel", when using the "copy all text" feature, however the "copy"-event itself is not prevented.
The result is that if the user selects all text in a long PDF document and then uses the copy-shortcut multiple times in quick succession, we'll actually populate the clipboard with "incomplete" contents (via a `TextLayerBuilder` copy-listener) until all text-extraction finishes.
In PR #16295 one occurrence of this was changed, but a few more remained
in the codebase. This commit fixes the other occurrences so that we
don't use the deprecated way of creating custom events anywhere anymore.
According to MDN, see https://developer.mozilla.org/en-US/docs/Web/API/CustomEvent/initCustomEvent,
using the `CustomEvent.initCustomEvent` method is deprecated and the
`CustomEvent` constructor should be used instead.
Extends d9bf571f5c49e1cac9054cf6b7acfc0b5b719876.
In PR #16327 the `eslint-plugin-mozilla` package was updated so we no
longer have to force-install packages, and the force-install flags for
`npm install` were removed. However, the CI job was missing from this
commit, which we fix here. In general force-installing packages
shouldn't be necessary unless there are problems with dependencies,
which we would like to know about, so especially in the CI job it seems
like a good idea to not force-install packages to catch upcoming defects
early on.
Extends 19526d2322fabd4425688bb7c5504fa9ea015c5c.
This method was added in PR 4938, almost nine years ago, however it doesn't appear to ever have been used.
Given the similarities between the `PDF17` and `PDF20` classes, and how they're used, if the `PDF20.hash` method was actually necessary you'd also expect a similiar method in the `PDF17` class.
The "binary" CMap-format is specific to the PDF.js library, and is used to reduce the size of the built-in CMap data-files.
By moving this code to its own file we can remove the nowadays unnecessary closures, which helps to slightly reduce the size of this code.
The latest version of `eslint-plugin-mozilla` removed the Prettier dependency, see https://bugzilla.mozilla.org/show_bug.cgi?id=1677562, which means that we no longer need to use `npm install --force` in the PDF.js library.
When permissions are enabled and the PDF document doesn't have the COPY-flag set, it shouldn't be possible for the user to trigger the "copy all text" feature.
While this slightly reduces duplication in the CSS rules, some of the auto-formatting done by Prettier is perhaps not great. (Given the overall advantage of using Prettier, we'll probably have to simply accept this.)
Hopefully these changes make sense (since this functionality is new to me), however the existing `xfa`-tests should help avoid any outright regressions.
Some arabic chars like \ufe94 could be searched in a pdf, hence it must be normalized
when creating the search query. So to avoid to duplicate the normalization code,
everything is moved in the find controller.
The previous code to normalize text was using NFKC but with a hardcoded map, hence it
has been replaced by the use of normalize("NFKC") (it helps to reduce the bundle size
by 30kb).
In playing with this \ufe94 char, I noticed that the bidi algorithm wasn't taking into
account some RTL unicode ranges, the generated font wasn't embedding the mapping this
char and the unicode ranges in the OS/2 table weren't up-to-date.
When normalized some chars can be replaced by several ones and it induced to have
some extra chars in the text layer. To avoid any regression, when copying some text
from the text layer, a copied string is normalized (NFKC) before being put in the
clipboard (it works like this in either Acrobat or Chrome).
After the previous patch we now have only *a single* `PRODUCTION` occurrence in the entire code-base, more specifically in the `web/viewer.html` file.
This special build-target can be replaced with any condition that always evaluate to `false`, such as e.g. a comment.
*Please note:* This patch might be considered too hacky, hence I completely understand if it's rejected.
This *special* build-target is very old, and was introduced with the first pre-processor that only uses comments to enable/disable code.
When the new pre-processor was added `PRODUCTION` effectively became redundant, at least in JavaScript code, since `typeof PDFJSDev === "undefined"` checks now do the same thing.
This patch proposes that we remove `PRODUCTION` from the JavaScript code, since that simplifies the conditions and thus improves readability in many cases.
*Please note:* There's not, nor has there ever been, any gulp-task that set `PRODUCTION = false` during building.
To make this functionality work out-of-the-box in custom implementations, see e.g. the "viewer components" examples, it'd be slightly easier if we dynamically create/insert the "hiddenCopyElement" in the `PDFViewer` constructor.
Given that the "copy all text" feature still appears to work just as before with this patch, hopefully I'm not overlooking any reason why doing this would be a bad idea.
I was playing with the new "copy all text" feature, and stumbled upon one document where the copied text was truncated; see http://mirrors.ctan.org/info/lshort/english/lshort.pdf
The problem turns out to be that on [page 83](https://ftp.acc.umu.se/mirror/CTAN/info/lshort/english/lshort.pdf#page=83) the textLayer contains `\u0000` and apparently copying just stops when a null char is encountered.
To fix this we can simply use an existing helper function, and with this patch we're able to successfully copy all the text in that document.
*Please note:* This patch only extends the `PDFFindController` implementation itself to support this functionality, however it's *purposely* not exposed in the default viewer.
This replaces the previous `phraseSearch`-parameter, and a `query`-string will now always be interpreted as a phrase-search.
To enable searching for individual words, the `query`-parameter must instead consist of an Array of strings. This way it's now also possible to combine phrase/word searches, with a `query`-parameter looking something like `["Lorem ipsum", "foo", "bar"]` which will search for the phrase "Lorem ipsum" *and* the words "foo" respectively "bar".
This method was originally added in PR 1320, eleven years ago, however it doesn't appear to ever have been used (not even from the start).
Furthermore, this method also tries to access a property that doesn't exist (`this.out`) and then call a method that also doesn't exist (`writeByteArray`).
In looking at https://bugs.ghostscript.com/show_bug.cgi?id=706451 I noticed that bug2.pdf was pretty
slow to load for such a basic file.
In profiling I noticed that a lot of time is spent in Array.concat, hence this patch use Array.push when
it's possible (it's now ~3 times faster).
The changes in PR 16238 were intended specifically for Node.js environments, however they accidentally applied to older browsers as well.
*Please note:* In up-to-date browsers `Path2D` is available in Workers, which should be connected to the introduction of `OffscreenCanvas`.
By getting the width/height of the first page initially, we can slightly reduce the amount of code needed both in the `hasEqualPageSizes`-check and when building the print-styles.
For the moment there is no real consensus on how we should download a pdf on Android.
Hence we keep this solution for the moment but behind a pref (which will be true on
nightly only).
Currently we repeat the same code in lots of places, to update the "toggled" class and "aria-checked" attribute, when various toolbar buttons are clicked.
For the MOZCENTRAL build-target this patch reduces the size of the *built* `web/viewer.js` file by just over `1.2` kilo-bytes.
Apparently the `structuredClone` polyfill doesn't handle transfers correctly, and `DOMException`s may thus be thrown. This is particularly problematical in Node.js environments, where that exception (obviously) isn't available.
To work-around these issues we'll simply ignore any transfers in `legacy`-builds, since those *may* use the `structuredClone` polyfill. This will obviously lead to slightly higher memory usage in those builds, however this really only affects Node.js environments. (Browsers are only affected if workers are disabled, however that's never been an officially recommended/supported configuration.)
Currently `float: inline-start/inline-end` is only supported in Firefox, see https://developer.mozilla.org/en-US/docs/Web/CSS/float#browser_compatibility, and in order to support other browsers we're thus forced to jump through some hoops.
This leads to slightly less nice code in the *built-in* Firefox PDF Viewer, and this patch attempts to improve the current situation:
- Use Stylelint to forbid direct use of `float: inline-start/inline-end` in the CSS files, to prevent future bugs in the general PDF.js viewer.
- Do a build-time replacement, only in MOZCENTRAL builds, to replace the CSS-variables with raw `float: inline-start/inline-end` instances.
Currently we have two separate image-caches on the worker-thread:
- A local one, which is unique to each `PartialEvaluator.getOperatorList` invocation. This one caches both names *and* references, since image-resources may be accessed in either way.
- A global one, which applies to the entire PDF documents and all its pages. This one only caches references, since nothing else would work.
This patch introduces a third image-cache, which essentially sits "between" the two existing ones. The new `RegionalImageCache`[1] will be usable throughout a `PartialEvaluator` instance, and consequently it *only* caches references, which thus allows us to keep track of repeated image-resources found in e.g. different /Form and /SMask objects.
---
[1] For lack of a better word, since naming things is hard...
This effectively implements some of the changes from https://phabricator.services.mozilla.com/D170496, but using our existing "direction aware" CSS-variable to limit the amount of code changes needed.
The patch changes the minimum supported version of Google Chrome as follows:
- Chrome 88, which was released on 2021-01-19; see https://en.wikipedia.org/wiki/Google_Chrome_version_history
This is done to allow use of modern CSS features, such as e.g. `:is()` and `:where()` in the code-base.
When installing the PDF.js project itself it's currently necessary to use `--force` in order for all packages to install correctly, see issue 15429, hence the same is also necessary when using the `gulp dist-install` command for local development/testing.
*Please note:* This parameter has never been used within the PDF.js library/viewer itself, and it was only ever added for backwards compatibility reasons.
This parameter was added in PR 7475, over six years ago, to try and optionally maintain the previous *default* text-extraction behaviour.
However as part of the general text-extraction improvements in PR 13257, almost two years ago, the `disableCombineTextItems` functionality was accidentally "broken" in various ways. Note how the only (very basic) unit-test was updated in a way that doesn't really make sense, since generally speaking you'd expect that using the option should result in *more* (or at least the same number of) text-items. Furthermore there's also the recent issue 16209, where the option causes almost all textContent to be concatenated together.
Hence this patch proposes that we simply remove the `disableCombineTextItems` option since it's essentially unused/untested functionality, as evident from the fact that it took almost two years for someone to notice that it's broken.
With the changes in PR 16153 we're no longer setting a `<base href>` in the Firefox PDF Viewer, hence it shouldn't be necessary to keep setting a `baseUrl` in the `PDFLinkService`-class.
Given that the original document URL is now kept, the browser itself will handle relative URLs and we can thus slightly reduce the amount of string parsing required when handling various links in the viewer.
Currently if you e.g. enable the `useOnlyCssZoom` option rendering may no longer finish as intended. To reproduce:
- Enable the `useOnlyCssZoom` option.
- Load https://github.com/mozilla/pdf.js/files/1522715/wuppertal_2012.pdf (in the development viewer).
- When rendering starts, *immediately* change the zoom-level.
In this case the document will never finish rendering, since the `postponeDrawing`-functionality will (here incorrectly) abort rendering and with CSS-only zooming rendering is only expected to happen once per page.
To fix this we'll simply ignore any `drawingDelay` when CSS-only zooming is used (regardless if it's triggered via the option or the zoom-level being very large).
Currently the `zoomLayer` isn't rotated correctly in all cases. To reproduce:
- Load https://github.com/mozilla/pdf.js/files/1522715/wuppertal_2012.pdf
- Let the document render.
- Rotate the document *four* times, such that the original rotation is restored.
The easiest solution, as far as I can tell, is that we always set the `transform` just as we did (for years) prior to the changes in PR 15812.
Originally we used helper functions for checking if something was a Dictionary or Stream, and then having an initial `typeof` check probably made sense.
However, given that we're using `instanceof` nowadays the additional check longer seems necessary.
Currently we're *virtually* duplicating the same code, for validating quotation marks, twice in this helper function.
The size decrease is quite small (107 bytes) and this makes the code slightly harder to reader, hence I completely understand if this patch is rejected.
Given that this functionality only applies in the viewer, when `PDFBug` is being enabled and used, it can't hurt to slightly reduce the size of this code.
- Reduce a little bit of duplication by enforcing the max/min scale-values once, at the end, in the `increaseScale`/`decreaseScale` methods.
- Convert the "private" `PDFViewer` scale-related methods into actually private ones, now that JavaScript supports that.
Having just reviewed a patch touching this code, I couldn't help noticing that an `Object` isn't really the optimal data-structure for this and nowadays we can do better by using a `Set` instead.
Given that the viewer always set the `dir`-attribute, to either LTR or RTL, we should be able to use this logical CSS property to (very slightly) reduce the size of the CSS; please see https://developer.mozilla.org/en-US/docs/Web/CSS/inset-block
This is something that I completely overlooked in PR 16162, which in some cases cause the default viewer to incorrectly print warnings.
This can be reproduced with the PAGE scrolling-mode, and/or the PresentationMode, and this patch simply work-around it by checking the visibility as well (since the warning is a best-effort solution anyway).
The `pageColors`-option was removed from the `CanvasGraphics`-constructor in PR 16075, hence the code in the API no longer needs to pass in that option; this is something that I missed during review.
The signatures of these methods were changed in PR 15886, which has now been included in a couple of releases, hence it should hopefully be OK to remove the fallback code-paths now.
Also, the methods are updated slightly to be explicit about what options are supported and we'll no longer pass along any arbitrary options to the "private" methods.
Some of these pre-processor statements are *many* years old, and could thus do with some clean-up. Note that the pre-processor originally didn't support else-if statements, and by using those the code becomes a bit less verbose.
The idea is to apply an overall filter on each page: the main advantage
is to have some filtered images which could help to make them visible for
some users.
During review of PR 16151 this method was simplified, however I overlooked the fact that we now can (and really should) improve this by removing duplication.
Unfortunately I don't believe that we can simply add a default `--scale-factor` CSS-variable to the `container`-element, since that might not be entirely appropriate/correct in all cases.[1]
However, we can at least print a console-error to hopefully make this situation more apparent to users. (This is purposely not using the `warn` helper-function, since those messages can be disabled.)
---
[1] One example is in our reference-tests, where we don't need to add it to the `container`-element itself.
With the previous commit this is now completely unused in API, hence it can be removed. This is done in a separate commit to make it easier to re-instate it, would the need ever arise.
This patch extends PR 16115 to work in all browsers, regardless of their `OffscreenCanvas` support, such that transfer functions will be applied to general rendering (and not just image data).
In order to do this we introduce the `BaseFilterFactory` that is then extended in browsers/Node.js environments, similar to all the other factories used in the API, such that we always have the necessary factory available in `src/display/canvas.js`.
These changes help simplify the existing `putBinaryImageData` function, and the new method can easily be stubbed-out in the Firefox PDF Viewer.
*Please note:* This patch removes the old *partial* transfer function support, which only applied to image data, from Node.js environments since the `node-canvas` package currently doesn't support filters. However, this should hopefully be fine given that:
- Transfer functions are not very commonly used in PDF documents.
- Browsers in general, and Firefox in particular, are the *primary* development target for the PDF.js library.
- The FAQ only lists Node.js as *mostly* supported, see https://github.com/mozilla/pdf.js/wiki/Frequently-Asked-Questions#faq-support
The tag <base> is used to resolve relative URIs within the document.
Newly added SVG filters use a relative URI which then use the URI in base
but this one mismatches with the document URI and consequently filters are
not found in the Firefox viewer.
So this patch just removes <base> and replace few relative URLs by absolute
ones.
In the general PDF.js library multiple PDF documents may be opened on the same web-page, which is why we many years ago started using document-specific identifiers to prevent issues with global data such e.g. with fonts.
Hence we need to treat the identifiers generated by the `FilterFactory` in the same way, since the SVG-filters for two separate PDF documents may otherwise get identical ids.
PDF gradients do not have color stops but an arbitrary PDF function of
the type f(t) -> color. CSS gradients are only based on color stops.
Most PDF gradient functions are produced from color stop oriented
gradients.
Take advantage of this by sampling the PDF function at a higher
frequency but not converting any samples which could be interpolated to
color stops. The sampling frequency is chosen to be the least common
multiple of as many values as practical to exactly re-create the common
case of the PDF function implementing equally spaced linearly
interpolated stops in RGB color space. This also allows for better
approximation of other smooth PDF functions (non-linear, or non-equally
spaced, or in different color space).
Fixes: #10572, #14165
The current value originated in PR 2317, and in the decade that have passed the amount of RAM available in (most) devices should have increased a fair bit.
Nowadays we also do a much better job of detecting repeated images at both the page- and document-level, which helps reduce overall memory-usage in many documents.
Finally the constant is also moved into the `src/shared/util.js` file, since it was implicitly used on both the main- and worker-thread previously.
Currently in PDF documents with large images we immediately cleanup once rendering has finished, in order to reduce memory-usage.
Normally that shouldn't be a big problem, however when e.g. repeated zooming happens in the viewer that could easily lead to a lot of wasted resources (and waiting).
Hence this patch, which introduces a new `PDFPageProxy` method that will slightly delay cleanup after rendering.
The dimensions still need to be fixed (from times to times they're in px)
but it doesn't have to be postponed anymore.
To test it: draw something and when resizing look at the dimensions of the div
in devtools, the units must be %.
This simply extends the approach in PR 10727 to also cover Patterns, which shouldn't be a common occurrence in Type3 fonts (since this is the first issue we've seen).
This patch updates the minimum supported environments as follows:
- Node.js 16, which was released on 2021-04-20; see https://en.wikipedia.org/wiki/Node.js#Releases
Note also that Node.js 14 will very soon reach EOL, and thus no longer receive any security updates.
This was deprecated in PR 15758, which has now been included in three official PDF.js releases.
While PR 15880 did limit the bundle-size impact of this functionality on e.g. the Firefox PDF Viewer, it still leads to some unnecessary "bloat" that these changes remove.
Furthermore, with this being deprecated there'd also be no effort put into e.g. extending the `UNSUPPORTED_FEATURES` list when handling future error cases.
The idea is to encode large image in BMP format (which is very simple and doesn't
require to compute any checksums) and then use createImageBitmap with a BMP blob
(which doesn't suffer of the Canvas/ImageData limits).
From a performance point of view, it isn't crazy (generating a large blob + decoding
it on the main thread is really not ideal) but at least we've something to display
which is a way better than a blank page (and one can notice that most of the time is
spent in decoding the image from the pdf stream).
PDF 32000-1:2008 7.10.5.1 "Type 4 (PostScript Calculator) Functions"
defers to the PostScript Language Reference for the description of these
functions. The PostScript Language Reference, third edition chapter 8
"Operators" defines the `angle` type as a "number of degrees". Section
8.1 defines "angle `sin` real", "angle `cos` real", and "num den `atan`
angle". The documentation for `atan` further states that it will return
an angle in degrees between 0 and 360.
Handle these operators correctly in `PostScriptEvaluator.execute`.
Convert the inputs to `sin` and `cos` from degrees to radians for use
with `Math.sin` and `Math.cos`. Correctly pop two values from the stack
for `atan`, use `Math.atan2`, and convert from radians to (positive)
degrees.
This was deprecated in PR 15943, which has now been included in two official PDF.js releases.
Given that `PDFDataRangeTransport` is somewhat unlikely to be used outside of the *built-in* Firefox PDF Viewer, it doesn't seem necessary to wait longer before removing this.
Also, removes the specific error-message for GENERIC builds to not unnecessarily "advertise" using non-objects when calling the `getDocument`-function.
*Please note:* This patch is written using the GitHub UI, since I'm currently without a dev machine, so hopefully it works correctly.
We introduced the use of OffscreenCanvas in #14754 and this patch aims
to use them for all kind of images.
It'll slightly improve performances (and maybe slightly decrease memory use).
Since an image can be rendered in using some transfer maps but because of
OffscreenCanvas we don't have the underlying pixels array the transfer maps
stuff is re-implemented in using the SVG filter feComponentTransfer.
Rather than repeatedly initializing a `canvasFactory`-instance for every page, move it to the document-level instead.
*Please note:* This patch is written using the GitHub UI, since I'm currently without a dev machine, so hopefully it works correctly.
Currently we repeat the `FeatureTest.isOffscreenCanvasSupported` checks all over the worker-thread code, and with upcoming changes this will become even "worse".
Hence this patch, which changes the *worker-thread* default value for the `isOffscreenCanvasSupported`-parameter to `false` and moves the feature-testing into the `BasePdfManager`-constructor.
*Please note:* This patch is written using the GitHub UI, since I'm currently without a dev machine, so hopefully it works correctly.
Currently some `getCtx` calls will have `isOffscreenCanvasSupported === undefined` set, meaning that `OffscreenCanvas` isn't being used as intended, since no `TextLayerRenderTask._isOffscreenCanvasSupported` property exists.
*Please note:* This patch is written using the GitHub UI, since I'm currently without a dev machine, so hopefully it works correctly.
I noticed several 'Path not found' errors because of a field called #subform[2].
From the XFA specs, the hash is used for a class of elements in the template tree.
When we're looking for a node in the datasets tree, it doesn't make sense to search
for a class. Hence the path element starting with a hash are just skipped.
In order to help to identify a link, we add a border around it with the LinkText color.
And backdrop colors are inverted when the mouse pointer hovers them, this way it should
help to identify the link where the pointer is.
With upcoming changes we'll potentially start to cache `ImageBitmap` data at the document-level, in addition to just at the page-level.
Hence we need to ensure that such data is actually released on clean-up, and rather than duplicating the existing *manual* handling this code is instead moved into the `PDFObjects.clear` method. (In my opinion, this is an overall improvement even without globally cached `ImageBitmap` data.)
*Please note:* This patch is written using the GitHub UI, since I'm currently without a dev machine, so hopefully it's correct and makes sense.
The `Buffer`-object is Node.js specific functionality[1], thus (obviously) not found in browsers. Please note that the PDF.js library has never officially supported/documented that binary data can be passed as a `Buffer`, and that *internally* in the `src/core`-code we only work with standard `Uint8Array`s.
This means that if, in Node.js environments, a `Buffer` is passed to the API we need to wrap it into a `Uint8Array`, which essentially means creating a copy of the data and thus increasing memory usage.
---
[1] Refer to https://nodejs.org/api/buffer.html#buffer
- Pass the `URL`-object directly to `getDocument`, since that's been supported since PR 13166.
- Remove support for the `disableRange`-option in the test-manifest, since it's completely unused. Please note that it's originally added in PR 2719, however there's never actually been any reference tests using it (not even from the start).
Given that the option is `false` by default everywhere (e.g. in the Firefox PDF Viewer) and that we have unit-tests for `disableRange = true`, it doesn't seem necessary to add new reference tests for it now.
Currently we duplicate the same code more than once in the `test/driver.js` file, which we can avoid by adding a new `AnnotationStorage` helper method instead.
Given that this helper function is only used on the worker-thread, there's no reason to duplicate it in both of the *built* `pdf.js` and `pdf.worker.js` files.
Currently these classes take a bunch of parameters (somewhat randomly ordered), probably because this is very old code that's been extended over the years.
Hence this patch changes the constructors to use parameter-objects instead, which improves consistency and (slightly) reduces the amount of code as well.
*Please note:* Also removes the `msgHandler`-property on these classes, since I cannot find a single call-site that accesses it.
Given that the debugging hash-parameters will only be used when the `pdfBugEnabled` option is manually set[1], we can skip a *tiny* bit of asynchronicity for "regular" users.
---
[1] Note that it's enabled by default in the development viewer, i.e. in `gulp server` mode.
Currently this helper function only has two call-sites, and both of them only pass in `ArrayBuffer` data. Given how it's implemented there's a couple of code-paths that are completely unused (e.g. the "string" one), and in particular the intended fast-paths don't actually work.
This patch re-factors and simplifies the helper function, and it'll no longer accept anything except `ArrayBuffer` data (hence why it's also re-named).
Note that at the time when `arraysToBytes` was added we still supported browsers without TypedArray functionality, and we'd then simulate them using regular Arrays.
When printing the pdf in #12233 in Acrobat, we can see that the combo for country
is empty: it's because the V entry doesn't have to be one of the options.
We're using this helper function when reading data from the [`PDFWorkerStreamReader.read`](a49d1d1615/src/core/worker_stream.js (L90-L98)) and [`PDFWorkerStreamRangeReader.read`](a49d1d1615/src/core/worker_stream.js (L122-L128)) methods, and as can be seen they always return `ArrayBuffer` data. Hence we can simply get the `byteLength` directly, and don't need to use the helper function.
Note that at the time when `arrayByteLength` was added we still supported browsers without TypedArray functionality, and we'd then simulate them using regular Arrays.
In the mac case we don't want to care about the scaleFactor threshold
because else if too big another move could start and then subsequent
events aren't considered as wheel events.
It isn't really ideal and at some point we'll need to find a way at
least for the Firefox case to get the real events instead of the fake
wheel ones.
In looking at a profile, I noticed in Marker chart that there's an animation
for loading-icon.gif even if this icon isn't visible.
This patch doesn't completely remove it but just slightly postpones it.
This further extends the web-specific import maps introduced in PR 16009, to allow removing *most* of the build-time `require` statements from the viewer. The few remaining ones are fallbacks used for the COMPONENTS respectively the `legacy` GENERIC builds.
After the compatibility updates in PR 15968 it's no longer strictly necessary to build the `viewer.css` file in order for the *development viewer* to work in Chromium-based browsers.
*Please note:* Given that Chromium-based browsers still don't support the *unprefixed* `mask-image` property the icons won't look right, however the development viewer itself works.
Given that Firefox is the *primary* development target, and that running `gulp generic` locally will generate polyfilled CSS, it seems reasonable to make this simplification here.
Currently there's no toolbar in the GV-viewer, hence invoking the pageLabels functionality isn't meaningful and just leads to unnecessary parsing on both the main- and worker-threads. (And if a toolbar is added at some point, it's not clear to me if we'd want to support pageLabels in the GV-viewer anyway.)
Currently we have a couple of pre-processor checks, specifically for the GV-viewer, spread throughout the code. This works fine when *building* the viewer, however they're obviously ignored in development mode (i.e. `gulp server`).
This leads to a situation where the GV development viewer, i.e. http://localhost:8888/web/viewer-geckoview.html, behaves subtly different from its built version. This could easily lead to bugs, hence this patch introduces a development mode constant to hopefully improve things here.
Finally, in a follow-up to PR 15842, also ignores the `pageMode`-state since there's no sidebar available.
Currently there's no UI for this functionality in the GV-viewer, however we still call the API methods. This potentially leads to a bunch of worker-thread parsing, for PDF documents with these features, despite the result being completely unused.
Given that mobile devices are usually more resource constrained than desktop/laptop computers, not to mentioned battery life, we can avoid doing work that'll just be ignored anyway.
The `DownloadManager.openOrDownloadData` method is written for the default-viewer specifically, assuming a viewer able to handle e.g. URL search/hash parameters. In the viewer components there's obviously no such functionality, and we should thus trigger downloading of PDF attachments directly instead.
Given that the GV-viewer isn't using most of the UI-related components of the default-viewer, we can avoid including them in the *built* viewer to save space.[1]
The least "invasive" way of implementing this, at least that I could come up with, is to leverage import maps with suitable stubs for the GV-viewer.
The one slightly annoying thing is that we now have larger import maps across multiple html-files, and you'll need to remember to update all of them when making future changes.
---
[1] With this patch, the built `viewer.js` size is 391 kB and `viewer-geckoview.js` is 285 kB.
By default we're using worker-thread fetching (in browsers) of this data nowadays, however in Node.js environments or if the user provides custom factories we still fallback to main-thread fetching.
Hence it makes sense, as far as I'm concerned, to move this initialization into the `getDocument` function to ensure that the factories can actually be initialized *before* attempting to load the document.
Also, this further reduces the amount of `getDocument` parameters that we need to pass into into the `WorkerTransport` class.
Currently we're passing all available parameters to this function respectively class, despite that not actually being necessary.
By splitting the parameters we not only improve the structure, and basically "document" the code a little bit, but we can also simplify the `_fetchDocument` function considerably.
This is very old code, where we loop through the user-provided options and build an internal parameter object. To prevent errors we also need to ensure that the parameters are correct/valid, which is especially important for the ones that are sent to the worker-thread such that structured cloning won't fail.[1]
Over the years this has led to more and more code being added in `getDocument` to validate the user-provided options, and at this point *most* of them have at least basic validation. However the way that this is implemented feels slightly backwards, since we first build the internal parameter object and only *afterwards* validate those parameters.[2]
Hence this patch changes the `getDocument` function to instead check/validate the supported options upfront, and then *explicitly* build the internal parameter object with only the needed properties.
---
[1] Note the supported types at https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers_API/Structured_clone_algorithm#supported_types
[2] The internal parameter object may also, because of the loop, end up with lots of unnecessary properties since anything that the user provides is being copied.
These functions invoke the `PDFViewer.currentPageNumber` setter, which already checks that a `pdfDocument` is currently active. Also, given that they're event handlers for the First/Last-page buttons (in the SecondaryToolbar) they can't be invoked before the viewer has been fully initalized.
The default value of the `--scale-select-width` CSS variable has been choosen such that it should be large enough for most locales. This means that in many locales we don't even update the CSS variable at all, and for those locales where we do the update happens *one time* early during the viewer initialization (i.e. before the PDF document has loaded).
*Please note:* Compared to other recent PRs, the effect of these changes ought to be really tiny and are mostly done to promote better coding patterns.
We should be able to let Jasmine simply compare directly against an actually empty Object, rather than using a manually implemented helper function for that.
A number of methods have their Promises cached, to avoid repeated worker round-trips, since they're expected to be called more than once from the default viewer. The way that the caching is currently implemented means that we need to remember to manually clear these Promises on document cleanup/destruction, and it'd be nice to avoid that.
With this patch the relevant Promises are now instead placed in just one `Map`, which is easy to clear, and a new helper method is also introduced to reduce duplication for *simple* `WorkerTransport` methods.
The reasons for making this change are:
- There's no UI available to toggle the cursor-tools in the GeckoView-specific viewer.
- The `HandTool`-implementation basically *simulates* touch scrolling, and is thus unlikely to be helpful/useful anyway.
- PR 15831 already changed the relevant call-sites to handle `PDFViewerApplication.pdfCursorTools` being undefined.
Some of the code in this method is *very* old, and we could thus modernize it a little bit by removing a couple of the loops used to build the `getDocument` argument.
These `@media` rules were most likely just copy-pasted from the regular viewer, however none of them are currently necessary since the GeckoView-specific viewer doesn't have any toolbars.
Note that the whole purpose of these CSS rules are to make the toolbar, of the regular viewer, responsive. If we in the future add toolbars for the GeckoView-specific viewer, these rules most likely wouldn't be usable as-is anyway.
This option was added specifically for third-party users, but has never been used in the PDF.js project itself. Furthermore there's no preference that can be used to enable it, and you need to provide the `removePageBorders` option when initializing a `PDFViewer`-instance.
This patch thus get rid of a little bit more unused code in the Firefox PDF Viewer.
*Unfortunately I missed this during testing/reviewing of PR 15992.*
With the changes in PR 15992 we're now only adding the `loadingIcon`-class when rendering is actually `RUNNING`, in order to improve overall performance.
However when resetting the page, i.e. the `INITIAL` state, we also need to remove the `loadingIcon` completely. Without this patch if you scroll through a document where the pages don't load instantaneously, see e.g. issue 2504, we'll leave the `loadingIcon`-class attached to pages that have had their rendering cancelled *and* also been evicted from the `PDFPageViewBuffer`-instance.
This way we don't have a lot of useless divs and we let the css engine handle the
creation/destruction of the :after pseudo-element.
It'll help to slightly improve performance when zooming.
The only parameter that we actually need here is the `PDFDataRangeTransport`-instance, since the others are not necessary.
- The `url` parameter, as passed to the `getDocument` function in the API, is simply being ignored; see 2d87a2eb1c/src/display/api.js (L447-L458)
- The `length` parameter, as passed to the `getDocument` function in the API, is always being overwritten; see 2d87a2eb1c/src/display/api.js (L519-L525)
Until PR 12563 is deemed safe to land, I'd still like to be able to use worker-modules in the viewer during local development.
Hence this patch which *temporarily* adds a new `workerModules` hash-parameter, only available in non-PRODUCTION mode, that allows using worker-modules in the development viewer.
To enable this functionality, simply use http://localhost:8888/web/viewer.html#workerModules=true
The initial CMap support was added in PR 4259 using the "raw" Adobe files, however they were quickly deemed to be unnecessarily large. As a result PR 4470 introduced the more compact "binary" CMap format, with both of those PRs being included in the very same release (version `0.8.1334`) .
Please note that we've thus never shipped anything *except* the "binary" CMap files with the PDF library, and furthermore note that we've not even once updated the CMap files since they were originally added almost nine years ago.
Requiring users to remember that `cMapPacked = true` is necessary, in addition to setting the `cMapUrl` parameter, in order for CMap loading to work feels like a less than ideal API.
Hence this patch, which suggests that we simply let `cMapPacked` default to `true` now.
If this method was added today, I really can't imagine that we'd support anything *except* objects. Unfortunately we cannot just remove this now, since the code has existed since "forever", however we can deprecate this and limit it to only the GENERIC build.
Furthermore, we can avoid a redundant `PDFViewerApplication.setTitleUsingUrl` call in the Firefox PDF Viewer since the title has already been set previously in that case.
Until just recently the only existing `Path2D` polyfill didn't have support for Node.js and/or the `node-canvas` package. Given that this was just fixed, in the latest version, we can now finally remove our inline-checks at the relevant call-sites; please also see https://github.com/nilzona/path2d-polyfill#usage-with-node-canvas
In PR #15757, a value is automatically converted into a number when it's possible
but the case of numbers like "000123" has been overlooked and their format must
be preserved.
When a script is doing something like "foo.value + bar.value" and the values are
numbers then "foo.value" must return a number but the displayed value must be what
the user entered or what a script set, so this patch is just adding a a field
_orginalValue in order to track the value has it has defined.
Some people are used to use a comma as decimal separator, hence it must be considered
when a value is parsed into a number.
This patch is fixing a regression introduced by #15757.
There's really no need for these "complicated" default value assignments, since `GlobalWorkerOptions` is a local variable at this point, and this is rather a case of too much copy-and-paste.
Note that years ago, when all options were set using a global `PDFJS` object, it's possible that options had been set (from the outside) *before* the object had been properly initialized; see e.g. a89071bdef/src/display/global.js
In general it's always recommended to pass a *parameter object* when calling the `getDocument`-function in the API, since that's the only way to provide additional options, and the fact that it also accepts a URL or TypedArray directly is now mostly for backwards compatibility reasons.
Unfortunately we cannot really remove this, since that code has existed since "forever", however we can limit it to only the GENERIC build to avoid completely unnecessary checks in e.g. the Firefox PDF Viewer.
Finally, note that the default-viewer always provides a *parameter object* when calling the `getDocument`-function and it's thus completely unaffected by these changes.
By being less specific about which *exact* JavaScript features are required for the default vs `legacy` build, we don't need to worry about keeping multiple README files up-to-date.
These README files will now refer back to the FAQ for current browser/environment support information.
This function is only used in PresentationMode these days, but we can still improve it a little bit:
- Use the existing web-platform `deltaMode` constants, rather than defining our own constants for those values.
- Access the `deltaMode` first, before the `delta{X, Y}` properties, to avoid being affected by bug 1392460 (similar to the default viewer).
- Use a `URL`-instance directly, since it's by definition an absolute URL.
- Actually limit the "raw" url-string handling to Node.js environments, as intended.
- Skip the warning, since we're already throwing an Error if the `url`-parameter is invalid.
It seems nicer overall, since we're exporting the `ProgressBar` in the viewer-components, to move this functionality into the `ProgressBar`-class itself rather than handling it "manually" in the default-viewer.
*Please note:* I cannot reproduce the problem reported in bug 1811668, regarding the context menu, and in any case it's not clear that that part is even a PDF Viewer bug.
Looking at bug 1811668 I couldn't help but noticing that the textLayer isn't correct, and it's unfortunately once again a problem with the `adjustType1ToUnicode` function. That's intended to help improve text-selection for fonts without a /ToUnicode-entry, and in many cases it does help (the original PR fixed lots of issues) however it's also caused some problems.
In order to improve text-selection in bug 1811668, we'll now properly ignore fonts that have a predefined *named* encoding specified since that's really the intention with PR 14050.
At the beginning of a search we can an update can be triggered with 0 over 0
found matches.
In the GeckoView context, we can't update the finder whenever we want but only
when it has been required.
The JBIG2 images in this PDF document are corrupt enough that even Adobe Reader warns about it when opening the file.
*Please note:* I don't really know the JBIG2 image format at all, however from a very brief look at the specification it seems that integers should be 32-bit.
In general it's recommended to pass a *parameter object* when calling the `getDocument`-function in the API, since that's the only way to provide additional options, and the fact that it also accepts a URL or TypedArray directly is now mostly for backwards compatibility reasons.
However, the `getDocument`-function also accepts a direct `PDFDataRangeTransport`-instance which just seems unnecessary.
*Please note:* The `PDFDataRangeTransport`-implementation was added specifically for the *built-in* Firefox PDF Viewer, however it's most likely not commonly used by any third-party (given that it requires manual PDF-data loading).
Furthermore, the default-viewer always provides a *parameter object* when calling the `getDocument`-function and it's thus completely unaffected by these changes.
Rather than adding `@media (forced-colors: active) { ... }`-blocks throughout the CSS code, we should utilize CSS variables instead as in our other CSS files.
The relevant TrueType font is missing both /ToUnicode *and* /Encoding entires, either of which would have prevented the (current) broken textLayer rendering.
My first idea was that we could use the `post` table in the TrueType font, see https://developer.apple.com/fonts/TrueType-Reference-Manual/RM06/Chap6post.html, to get the actual glyphNames and amend the fallback ToUnicode-map that way. Unfortunately that didn't work, since the `post` table only contained ".notdef" and "" (i.e. empty string) entries.
Instead we try to use the `name` table in the TrueType font, see https://developer.apple.com/fonts/TrueType-Reference-Manual/RM06/Chap6name.html, to determine if the platform is Windows and thus fallback to generate a ToUnicode-map from the `WinAnsiEncoding`.
When a css variable is update in a node then all the children under this
node are updated.
In order to avoid to update all the UI when a page is rescaling, this
patch moves the --scale-factor from the :root to the viewer container.
Note how all over the `src/core/annotation.js`-code we're assuming that if an `appearance`-entry exists it's also a Stream. However, we're not actually checking that thoroughly enough which causes issues in some badly generated PDF documents.
After the changes in PR 15812 we'll now *intermittently* display completely black canvases during zooming. To reproduce this, try switching to wrapped-scrolling and zoom in/out very quickly using either the mouse-wheel or pinching.
This patch removes the recently introduced `transferPdfData` API-option, and simply enables transferring of TypedArray data *by default* instead of copying it. This will help reduce main-thread memory usage, however it will take ownership of the TypedArrays. Currently this only applies to the following cases:
- TypedArrays passed to the `getDocument`-function in the API, in order to open PDF documents from binary data.
- TypedArrays passed to a `PDFDataRangeTransport`-instance, used to support custom PDF document fetching/loading (see e.g. the Firefox PDF Viewer).
*PLEASE NOTE:* To avoid being affected by this, please simply *copy* any TypedArray data before passing it to either of the functions/methods mentioned above.
Now that we transfer TypedArray data that we previously only copied, we need to be more careful with input validation. Given how the `{IPDFStreamReader, IPDFStreamRangeReader}.read` methods will always return ArrayBuffer data, which is then transferred to the worker-thread[1], the actual TypedArray data passed to the API thus need to have the same exact size as its underlying ArrayBuffer to prevent issues.
Hence we'll check for this and only allow transferring of *safe* TypedArray data, and fallback to simply copying the data just as before. This obviously shouldn't be an issue in the Firefox PDF Viewer, but for the general PDF.js library we need to be more careful here.
---
[1] See e09ad99973/src/display/api.js (L2492-L2506) respectively e09ad99973/src/display/api.js (L2578-L2590)
Note how in the API we're transferring the PDF data that's fetched over the network[1]:
- f28bf23a31/src/display/api.js (L2467-L2480)
- f28bf23a31/src/display/api.js (L2553-L2564)
To support that functionality we have the `PDFDataTransportStream`, `PDFFetchStream`, `PDFNetworkStream`, and `PDFNodeStream` implementations. Here these stream-implementations vary slightly in how they handle `ArrayBuffer`s internally, w.r.t. transferring or copying the data:
- In `PDFDataTransportStream` we optionally, after PR 15908, allow transferring of the PDF data as provided externally (used e.g. in the Firefox PDF Viewer).
- In `PDFFetchStream` we're currenly always copying the PDF data returned by the Fetch API, which seems unnecessary. As discussed in PR 15908, it'd seem very weird if this sort of browser API didn't allow transferring of the returned data.
- In `PDFNetworkStream` we're already, since many years, transferring the PDF data returned by the `XMLHttpRequest` functionality. Note how the `getArrayBuffer` helper function simply returns an `ArrayBuffer` response as-is.
- In `PDFNodeStream` we're currently copying the PDF data, however this is unfortunately necessary since Node.js returns data as a `Buffer` object[2].
Given that the `PDFNetworkStream` has been, indirectly, supporting transferring of PDF data for years it would seem really strange if this didn't also apply to the `PDFFetchStream`-implementation.
Hence this patch simply enables transferring of PDF data, when accessed using the Fetch API, unconditionally to help reduced main-thread memory usage since the `PDFFetchStream`-implementation is used *by default* in browsers (for the GENERIC build).
---
[1] As opposed to PDF data being provided as e.g. a TypedArray when calling `getDocument` in the API.
[2] This is a "special" Node.js object, see https://nodejs.org/api/buffer.html#buffer, which doesn't exist in browsers.
- Scale factor is rounded to only scale by integer percent, hence the unused
ticks are accumulated (like we already do for zoom with the mouse wheel).
- Use the same thing for the pinch-to-zoom on a touchscreen: it led to slightly
refactor the code because it happened to ignore a not so small scale which
led to a not so smooth zooming.
Also, removes the `initialData`-parameter JSDocs for the `getDocument`-function given that this parameter has been completely unused since PR 8982 (over five years ago). Note that the `initialData`-parameter is, and always was, intended to be provided when initializing a `PDFDataRangeTransport`-instance.
Version 16 that we used before is now in maintenance mode, so we should
upgrade to the most recent LTS version. For more information on the
Node.js release schedule please refer to
https://github.com/nodejs/release#release-schedule.
After the changes in PR 15850, the `background-color` of the sidebar is now unnecessarily dark in the light-theme. Hence, we can simply remove this CSS rule to improve things overall (and these changes don't affect the dark-theme much at all).
This is even an overall consistency improvement, given the existing `--sidebar-narrow-bg-color` values.
Given that this is internal functionality, not exposed in the official API, it's not entirely clear (at least to me) why we can't just initialize this directly in `src/display/api.js` instead.
When testing both the development viewer and all the ways in which we run tests, everthing still appears to work just fine with this patch.
*Please note:* The reduced test-case is *not* a perfect reproduction of the original PDF document, since this one fails to open in e.g. Adobe Reader, but I do believe that it captures the most important points here.
For corrupt *and* encrypted PDF documents, it's possible that only some trailer dictionaries actually contain an /Encrypt-entry. Previously we'd could easily miss that, since we generally pick the first not obviously corrupt trailer dictionary, and the solution implemented here is to simply pre-parse all trailer dictionaries to see if there's any /Encrypt-entries.
In most of the cases, showing the loading icon is useless because
it's for a very short time, consequently it doesn't bring any useful
information for the user.
After a delay (400ms), the icon is shown in order to inform the user
that the viewer isn't stuck but it's doing something.
In GeckoView, on an event, a callback must be executed with the result of an action,
but the callback can be used only one time.
So for each FindInPage event, we must trigger only one matches count update.
This fixes a warning reported by CodeQL, and should also make general sense given that we parse the font-data to determine the *actual* `type`/`subtype` rather than trusting the PDF document.
This option/preference was disabled in GENERIC builds, see PR 15812, to avoid landing it *just before* a new release. Hence it should be fine to enable this now.
This was deprecated in PR 15758 but it's unfortunately quite difficult to tell if third-party users are depending on this, e.g. to implement custom error reporting, and if so to what extent.
However, thanks to the pre-processor we can limit *most* of this code to GENERIC builds which still seem like a worthwhile change.
These changes reduce the bundle size of the Firefox PDF Viewer by 3.8 kB in total.
This was deprecated in PR 15758 and given that it's quite unlikely that any third-party users are relying on this functionality, since it was only ever added to support telemetry reporting in the Firefox PDF Viewer, it should hopefully be fine to remove this fairly quickly.
These changes reduce the bundle size of the Firefox PDF Viewer by 4.5 kB in total.
Given that the Fetch API only supports the http/https protocols, worker-thread fetching of CMaps and Standard-fonts may thus fail in certain cases. To improve the default behaviour we'll now also check that the `cMapUrl` and `standardFontDataUrl` options are appropriate, except in Firefox where this should always work.
This tweaks a few name that originated in PR 15812, to improve overall consistency:
- Use the `drawingDelay` parameter-name in all methods that accept a delay.
- Use the `postponeDrawing` variable-name in all relevant methods.
With upcoming background changes elsewhere in the viewer, this should be helpful in separating the styling of the loadingBar. These changes also means that both the "regular" and the "indeterminate" loadingBar now uses the same `background-color` value.
Also, shortens the related CSS variables a little bit since that can't hurt.
*This makes the same kind of changes as in the previous patch, but for the pageNumber-loadingIcon in the main toolbar.*
To display the pageNumber-loadingIcon when rendering starts, if the page is the most visible one, we'll utilize the existing "pagerender" event.
To toggle the pageNumber-loadingIcon as the user moves through the document we'll now instead utilize the "pagechanging" event, which should actually be slightly more efficient overall[1]. Note how we'd, in the old code, only consider the most visible page anyway when toggling the pageNumber-loadingIcon.
---
[1] Even in a PDF document as relatively short/simple as `tracemonkey.pdf`, scrolling through the entire document can easily trigger the "updateviewarea" event more than a thousand times.
Given that we only render one page at a time, this will lead to only *one* page-loadingIcon being displayed at a time even if multiple pages are visible in the viewer. However, this will make it clearer which page is the currently parsing/rendering one.
To simplify toggling of the page-loadingIcon visibility, the existing `PDFPageView.renderingState` is changed into a getter/setter-pair with the latter also handling the page-loadingIcon state.
An additional benefit of these changes is that the `PDFViewer` no longer needs to handling toggling of page-loadingIcon visibility during rendering, since there can only ever be *one* page rendering.
Finally, this may also simplify future changes w.r.t. page-loadingIcon visibility toggling (using e.g. a show-timeout).
The rotation-caching added in PR 15812 completely breaks initialization of PDF documents with varying page sizes, causing all pages to wrongly get the same size; see e.g. `sizes.pdf` from the test-suite.
To fix that without having to e.g. add a new parameter, which feels error prone, this patch changes the `PDFPageView.#setDimensions` method to completely ignore the rotation-caching until the `setPdfPage`-method has been called.
Note how, in the scripting initialization in the viewer, we only ever invoke `PDFPageProxy.getJSActions` *once* per page in order to improve overall performance; see a575aa13b9/web/pdf_scripting_manager.js (L372-L375)
Hence it really shouldn't be necessary to cache its result in the API, especially when that is done *manually* rather than using something like `shadow`.
When we're destroying a `PDFPageProxy`-instance, during full document destruction, we'll force-abort any worker-thread parsing of operatorLists. Hence we should make sure that any pending cancel timeout is always aborted, since a later `PDFPageProxy._abortOperatorList` call should always "replace" a previous one.
*Please note:* Technically this was always wrong, but with the changes in PR 15825 it became *ever so slightly* easier to trigger this thanks to the potentially longer timeout.
Right now, the visible pages are redrawn for each scale change.
Consequently, zooming with mouse wheel or in pinching can be pretty janky
(even on a desktop machine but with a hdpi screen).
So the main idea in this patch is to draw the visible pages only once zooming
is finished.
After the changes in PR 15829 the `loadingIconDiv` is no longer always visible when it should be, specifically in the case where we cancel and re-render a partially parsed/rendered page.
To reproduce this, try opening https://github.com/mozilla/pdf.js/files/1522715/wuppertal_2012.pdf in the viewer and change the zoom level while rendering is ongoing. In this case the `loadingIconDiv` doesn't actually become visible, despite being present in the DOM, since it's no longer at the end of the page-div.
I don't know to what extent this renders PR 15829 "pointless", however we're not repeatedly re-creating and re-inserting the `loadingIconDiv` but rather just *move* the existing element in the DOM.
When trying to find incomplete objects, i.e. those missing the "endobj"-string at the end, there's unfortunately a number of possible operators that we need to check for. Otherwise we could miss e.g. the "trailer" at the end of a corrupt PDF document, which is why the referenced document didn't work.
Currently we do all searching on the "raw" bytes of the PDF document, for efficiency, however this doesn't really work when we need to check for *multiple* potential command-strings. To keep the complexity manageable we'll instead use regular expressions here, but we can at least avoid creating lots of substrings thanks to the `RegExp.lastIndex` property; which is well supported across browsers according to https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/lastIndex#browser_compatibility
Note that this repeated regular expression usage could perhaps be slightly less efficient than the old code, however this method is only invoked for corrupt PDF documents.
Given that the `ProgressBar`-constructor was updated, we need to update the "mobile-viewer" example as well; this is yet another thing I missed during review.
We'll no longer import the `SimpleLinkService` dependency unconditionally in the file, since it's only used in COMPONENTS-builds.
Furthermore, for the COMPONENTS-builds, we'll create a `SimpleLinkService`-instance only for those layers that actually need it.
Please note that this functionality has never really mattered for the Firefox PDF Viewer, the GENERIC viewer, or even the "simpleviewer"/"singlepageviewer" component-examples. Hence, in practice this means that only the "pageviewer" component-example[1] have ever really utilized this.
Using factories to initialize various layers in the viewer, rather than simply invoking the relevant code directly, seems (at least to me) like a somewhat roundabout way of doing things.
Not only does this lead to more code, both to write and maintain, but since many of the layers have common parameters (e.g. an `AnnotationStorage`-instance) there's also some duplication.
Hence this patch, which removes the `xfaLayerFactory` and instead uses a lookup-function in the `PDFPageView`-class to access the external viewer-properties as necessary.
Note that this should even be an improvement for the "pageviewer" component-example, since most layers will now work by default rather than require manual configuration.
---
[1] In practice we generally suggest using the "simpleviewer", or "singlepageviewer", since it does *most* things out-of-the-box and given that a lot of functionality really require *a viewer* and not just a single page in order to work.
Please note that this functionality has never really mattered for the Firefox PDF Viewer, the GENERIC viewer, or even the "simpleviewer"/"singlepageviewer" component-examples. Hence, in practice this means that only the "pageviewer" component-example[1] have ever really utilized this.
Using factories to initialize various layers in the viewer, rather than simply invoking the relevant code directly, seems (at least to me) like a somewhat roundabout way of doing things.
Not only does this lead to more code, both to write and maintain, but since many of the layers have common parameters (e.g. an `AnnotationStorage`-instance) there's also some duplication.
Hence this patch, which removes the `textLayerFactory` and instead uses a lookup-function in the `PDFPageView`-class to access the external viewer-properties as necessary.
Note that this should even be an improvement for the "pageviewer" component-example, since most layers will now work by default rather than require manual configuration.
---
[1] In practice we generally suggest using the "simpleviewer", or "singlepageviewer", since it does *most* things out-of-the-box and given that a lot of functionality really require *a viewer* and not just a single page in order to work.
Please note that this functionality has never really mattered for the Firefox PDF Viewer, the GENERIC viewer, or even the "simpleviewer"/"singlepageviewer" component-examples. Hence, in practice this means that only the "pageviewer" component-example[1] have ever really utilized this.
Using factories to initialize various layers in the viewer, rather than simply invoking the relevant code directly, seems (at least to me) like a somewhat roundabout way of doing things.
Not only does this lead to more code, both to write and maintain, but since many of the layers have common parameters (e.g. an `AnnotationStorage`-instance) there's also some duplication.
Hence this patch, which removes the `textHighlighterFactory` and instead uses a lookup-function in the `PDFPageView`-class to access the external viewer-properties as necessary.
Note that this should even be an improvement for the "pageviewer" component-example, since most layers will now work by default rather than require manual configuration.
---
[1] In practice we generally suggest using the "simpleviewer", or "singlepageviewer", since it does *most* things out-of-the-box and given that a lot of functionality really require *a viewer* and not just a single page in order to work.
Please note that this functionality has never really mattered for the Firefox PDF Viewer, the GENERIC viewer, or even the "simpleviewer"/"singlepageviewer" component-examples. Hence, in practice this means that only the "pageviewer" component-example[1] have ever really utilized this.
Using factories to initialize various layers in the viewer, rather than simply invoking the relevant code directly, seems (at least to me) like a somewhat roundabout way of doing things.
Not only does this lead to more code, both to write and maintain, but since many of the layers have common parameters (e.g. an `AnnotationStorage`-instance) there's also some duplication.
Hence this patch, which removes the `structTreeLayerFactory` and instead uses a lookup-function in the `PDFPageView`-class to access the external viewer-properties as necessary.
Note that this should even be an improvement for the "pageviewer" component-example, since most layers will now work by default rather than require manual configuration.
---
[1] In practice we generally suggest using the "simpleviewer", or "singlepageviewer", since it does *most* things out-of-the-box and given that a lot of functionality really require *a viewer* and not just a single page in order to work.
Please note that this functionality has never really mattered for the Firefox PDF Viewer, the GENERIC viewer, or even the "simpleviewer"/"singlepageviewer" component-examples. Hence, in practice this means that only the "pageviewer" component-example[1] have ever really utilized this.
Using factories to initialize various layers in the viewer, rather than simply invoking the relevant code directly, seems (at least to me) like a somewhat roundabout way of doing things.
Not only does this lead to more code, both to write and maintain, but since many of the layers have common parameters (e.g. an `AnnotationStorage`-instance) there's also some duplication.
Hence this patch, which removes the `annotationLayerFactory` and instead uses a lookup-function in the `PDFPageView`-class to access the external viewer-properties as necessary.
Note that this should even be an improvement for the "pageviewer" component-example, since most layers will now work by default rather than require manual configuration.
---
[1] In practice we generally suggest using the "simpleviewer", or "singlepageviewer", since it does *most* things out-of-the-box and given that a lot of functionality really require *a viewer* and not just a single page in order to work.
Please note that this functionality has never really mattered for the Firefox PDF Viewer, the GENERIC viewer, or even the "simpleviewer"/"singlepageviewer" component-examples. Hence, in practice this means that only the "pageviewer" component-example[1] have ever really utilized this.
Using factories to initialize various layers in the viewer, rather than simply invoking the relevant code directly, seems (at least to me) like a somewhat roundabout way of doing things.
Not only does this lead to more code, both to write and maintain, but since many of the layers have common parameters (e.g. an `AnnotationStorage`-instance) there's also some duplication.
Hence this patch, which removes the `annotationEditorLayerFactory` and instead uses a lookup-function in the `PDFPageView`-class to access the external viewer-properties as necessary.
Note that this should even be an improvement for the "pageviewer" component-example, since most layers will now work by default rather than require manual configuration.
---
[1] In practice we generally suggest using the "simpleviewer", or "singlepageviewer", since it does *most* things out-of-the-box and given that a lot of functionality really require *a viewer* and not just a single page in order to work.
Currently we'll only initialize and render the `annotationEditorLayer` once the regular `annotationLayer` has been rendered.
While it obviously makes sense to render the `annotationEditorLayer` *last*, the way that the code is currently written means that if a third-party user disables the `annotationLayer` then the editing-functionality indirectly becomes disabled as well.
Given that this seems like a somewhat arbitrary limitation, this patch simply decouples these two layers while still keeping the rendering order consistent.
By moving this code the "pageviewer"-component example will become slightly more usable on its own, it may simplify a future addition of XFA Foreground document support, and finally also serves as preparation for the following patches.
The container position and dimensions should be almost constant, hence
it's pretty useless to query them on each rescale.
Finally it avoids to trigger some reflows.
First of all, given the screen-sizes of most mobile phones using Spread modes is unlikely to be useful.
Secondly, and more importantly, since there's (currently) no UI available for the user to override a PDF document-specified Spread mode this would result in a bad UX otherwise.
Also, removes an outdated comment from the `apiPageLayoutToViewerModes` helper function.
Previously we'd abort all parsing if an Error was encountered, despite the fact that multiple `startXRefQueue`-entries may be available and that continued parsing could thus eventually be able to find usable data.
Note that in the referenced PDF document the `startxref`-operator, at the end of the file, points to a position in the middle of an arbitrary `stream` which is why things break.
Depending on e.g. the `textLayerMode` option it might not actually be necessary to always initialize this eagerly.
*Please note:* Unfortunately we cannot `shadow` a private field, hence why this is only made semi-"private".
This is done to support upcoming viewer-changes, and in order to prevent third-party users from outright breaking things we'll simply ignore too large values.
It's a follow-up of #14950: some format actions are ran when the document is open
but we must be sure we've everything ready for that, hence we have to run some
named actions before runnig the global format.
In playing with the form, I discovered that the blur event wasn't triggered when
JS called `setFocus` (because in such a case the mouse was never down). So I removed
the mouseState thing to just use the correct commitKey when blur is triggered by a
TAB key.
In order to move the annotations in the DOM to have something which corresponds
to the visual order, we need to have their dimensions/positions which means that
the parent must have some dimensions.
While reviewing recent patches, I couldn't help but noticing that we now have a lot of call-sites that manually access the `PageViewport.viewBox`-property.
Rather than repeating that verbatim all over the code-base, this patch adds a lazily computed and cached getter for this data instead.
This was essentially done only to compensate for the viewer calling `PDFPageProxy.getAnnotations` unconditionally on every annotationLayer-rendering invocation. With the previous patch that's no longer happening, and this API-caching should thus no longer be necessary.
For pages without any annotations, applies e.g. to the `tracemonkey.pdf` document, we'll repeatedly try to re-create the `annotationLayer` on every zoom and rotation operation.
The reason that this happens is because we don't insert the `annotationLayer`-div into the DOM unless there's annotations present on the page, which thus means that we miss the existing `annotationLayer`-caching present in the `PDFPageView` implementation.
This is a very old issue, and the easiest solution is to simply always insert an *empty* (and hidden) `annotationLayer`-div such that the existing code/caching starts working for the "no annotations" case as well.
Note that this is consistent with other layers, since e.g. the `textLayer` and/or `annotationEditorLayer` may be empty. Given that only a limited, by default ten, number of pages are ever active at once the additional DOM-elements shouldn't effect things negatively here.
This is consistent with the `render` methods of the other layers, and reduces overall indentation in the method.
Furthermore, don't "swallow" errors since the `PDFPageView._renderXfaLayer` method is already able to deal with that.
It doesn't seem necessary to have a *separate* `destroy` method given that the `cancel` method always invokes it unconditionally.
In the `PDFPageView.reset` method we currently attempt to call `destroy` directly, however that'll never actually happen since either:
- We're keeping the annotationEditorLayer, in which case we're just hiding the layer and nothing more (and the relevant branch is never entered).
- We're removing the annotationEditorLayer, in which case the `PDFPageView.cancelRendering` method has already cancelled *and* nulled it (and there's thus nothing left to `destroy` here).
*Please note:* Hopefully I'm not overlooking something obvious here, since both reading through the code *and* also adding `console.log(this.annotationEditorLayer);` [before this line](9d4aadbf7a/web/pdf_page_view.js (L438)) suggests that it's indeed unnecessary.
In PR 14877 I forgot to update the horizontal padding, used when computing the scale of the pages, for the case where SpreadModes and PresentationMode are being used together.
Steps to reproduce:
1. Open the viewer with the default `tracemonkey.pdf` document.
1. Enable any SpreadMode.
2. Rotate the document *once*, either clockwise or counterclockwise.
3. Enter PresentationMode.
4. Try swithching page, e.g. by clicking on the document.
Expected result:
The visible pages change as you click.
Actual result:
The visible pages are "stuck" in the current view.
The `PDFPageProxy._pageIndex` property is a "private" one that shouldn't be accessed, since it could theoretically break tomorrow if we re-factor the relevant API code.
Also, try to clean-up and improve consistency in a couple of JSDoc comments.
This change was made in PR 5552, however I cannot tell why we needed to disable searching in PresentationMode. Furthermore, with the changes in PR 13908 which effectively moved where this code is invoked, searching has now (accidentally) been working in PresentationMode in e.g. the Firefox PDF Viewer for well over a year.
So, let's just enable searching unconditionally in PresentationMode to simplify the code.
An annotation editor layer can be destroyed when it's invisible, hence some
annotations can have a null parent but when printing/saving or when changing
font size, color, ... of all added annotations (when selected with ctrl+a) we
still need to have some parent properties especially the page dimensions, global
scale factor and global rotation angle.
This patch aims to remove all the references to the parent in the editor instances
except in some cases where an editor should obviously have one.
It fixes#15780.
The main issue is due to the fact that an editor's parent can be null when
we want to serialize it and that lead to an exception which break all the
saving/printing process.
So this incomplete patch fixes only the saving/printing issue but not the
underlying problem (i.e. having a null parent) and doesn't bring that much
complexity, so it should help to uplift it the next Firefox release.
Rather than handling these parameters separately, which is a left-over from back when streaming of textContent was originally added, we can simply pass either data directly to the `TextLayer` and let it handle things accordingly.
Also, improves a few JSDoc comments and `typedef`-imports.
Compared to the recent PR 15722 for the `textLayer` this one should be a (comparatively) much a smaller win overall, since most documents don't have any structTree-data and the required parsing should be cheaper. However, it seems to me that it cannot hurt to improve this nonetheless.
Note that by moving the `structTreeLayer` initialization we remove the need for the "textlayerrendered" event listener, which thus simplifies the code a little bit.
Also, removes the API-caching of the structTree-data since this was basically done to offset the lack of caching in the viewer.
*Please note:* I don't really expect that this is will be an observable change, since virtually all PDF documents already order e.g. /MediaBox and /CropBox entries correctly.
By normalizing boundingBoxes already on the worker-thread, we can be sure that even a corrupt document won't cause issues.
Note how we're passing the `view`-getter to the `PartialEvaluator.getTextContent` method, in order to detect textContent which is outside of the page, hence it makes sense to ensure that it's formatted as expected.
Furthermore, by normalizing this once on the worker-tread we should no longer have to worry about a possibly negative width/height in the `PageViewport` constructor.
Finally, the patch also simplifies the `view`-getter a little bit.
The idea is just to resuse what we got on the first draw.
Now, we only update the scaleX of the different spans and the other values
are dependant of --scale-factor.
Move some properties in the CSS in order to avoid any updates in JS.
The deprecation is included in the current release, i.e. version `3.1.81`, and given the edge-case nature of this option I really don't think that we need to keep it deprecated for multiple releases.
This patch has been successfully tested in a local, artifact, Firefox build.
*Please note:* The only thing that'll no longer work for PDF documents opened using "data:"-URLs is middle-clicking on internal/outline links, in order to open the destination in a new tab. This is however an extremely small loss of functionality, and as can be seen in the bug the alternative (i.e. doing nothing) is surely much worse.
Currently both of the `AnnotationElement` and `KeyboardManager` classes contain *identical* `platform` getters, which seems like unnecessary duplication.
With the pre-processor we can also limit the feature-testing to only GENERIC builds, since `navigator` should always be available in browsers.
Add a deprecation notification for PDFDocumentLoadingTask.onUnsupportedFeature and PDFDocumentProxy.stats
which are likely useless.
The unsupported feature stuff have initially been added in (#4048) in order to be able to display a
warning bar and to help to have some numbers to know how a feature was used.
Those data are no more used in Firefox.
This is very old code, which is unused (by default) in browsers nowadays since the Font Loading API will always be preferred.
For Node.js environments we use the same constant as elsewhere throughout the code-base, and we can also simplify the Firefox-specific check given that the lowest supported version is `102` (as of this writing).
Finally the old TODO is removed, since the general availability of the Font Loading API has made it redundant.
The use of `Array.prototype.reduce()` is, in my opinion, hurting overall readability since it's not particularly easy to look at the relevant code and immediately understand what's going on here. Furthermore this code leads to strictly speaking unnecessary allocations and parsing, since we could just track the min/max values directly in the relevant loop instead.
This has never really been used anywhere within the PDF.js library[1], and when streaming of textContent was introduced this parameter was effectively made redundant.
Note that when streaming of textContent is used, all text-layout has already happened by the time that this `timeout`-functionality is actually invoked (thus making it pointless).
While the `timeout`-functionality may still "work" when the textContent is provided upfront, although it's never been used/tested, streaming will generally perform better (in e.g. a viewer setting).
*Please note:* While unrelated here, also removes a now unused property that I forgot in PR 15259.
---
[1] At least not since the code was moved into its current file, which happened in PR 6619 and landed seven years ago.
This can't be a particularly common feature, since we've supported Optional Content for over two years and this is the very first TilingPattern-case we've seen.
The reason for the issue is that we use the generic `getFilenameFromUrl` helper function, which was originally intended for regular URLs.
For the filenames we're dealing with in FileAttachments, we really only want to strip the path when one exists[1].
---
[1] See [bug 1230933](https://bugzilla.mozilla.org/show_bug.cgi?id=1230933) for an example of such a case.
With the changes made in PR 14564 this *should* no longer be necessary now, however we still need to keep the `scrollMatches` parameter to handle textLayers with markedContent correctly when searching.
Currently *some* functions in this file have names while others don't, and in a few cases the names are no longer entirely accurate.
For the relevant functions there should really be no need to name them, and if memory serves this was originally done since browsers (many years ago) didn't always handle anonymous functions correctly in stack traces.
Given that this helper function is only used on the worker-thread, there's no reason to duplicate it in both of the `pdf.js` and `pdf.worker.js` files.
Given that this helper function is only used on the worker-thread, there's no reason to duplicate it in both of the `pdf.js` and `pdf.worker.js` files.
Given that these functions are virtually identical, with the latter only adding a BOM, we can combine the two. Furthermore, since both functions were only used on the worker-thread, there's no reason to duplicate this functionality in both of the `pdf.js` and `pdf.worker.js` files.
Having just played around with adding FreeText-annotations and then trying to print, there were `FreeTextAnnotation: OffscreenCanvas is not supported, annotation may not render correctly.` messages printed in the console.
The reason for this is that `FreeTextAnnotation` inherits from `MarkupAnnotation`, however only `WidgetAnnotation` actually defines the `_isOffscreenCanvasSupported` property.
Adding some logging with `console.{time, timeEnd}` around all the constant definitions at the top of the `web/pdf_find_controller.js` file, I noticed that computing `DIACRITICS_EXCEPTION_STR` took close to half the total time.
My first idea was just to try and make it slightly more efficient, by reducing the amount of iterations and intermediate allocations. However, with this constant only being used during "match diacritics" searches it thus seemed like a good candidate for lazy initialization.
*Please note:* Given that this is a micro optimization, I fully understand if the patch is rejected.
Given that this PDF document is an interesting test-case for performance reasons, w.r.t. inline image caching, it probably can't hurt to add it to the test-suite to make it more readily available.
Considering the contents of that PDF document I'm not sure if we can include it directly in the repository, hence why a *linked* test-case was choosen here.
Given that this `assert` is only intended to catch any implementation bugs in our code, and not actually to validate the PDF data directly[1], we can avoid making this function call unconditionally.
---
[1] In those cases, for example a `FormatError` should have been thrown instead.
With modern EcmaScript features, we can define these fields directly instead. Please note that for backwards compatibility purposes they are still public as before, however note that this functionality is *disabled* by default (see the `pdfBug` API option).
Also, we can (slightly) simplify the two loops used in the `toString` method.
These fields were never intended to be public, since modifying them manually would lead to inconsistent state, and with modern EcmaScript features we can now enforce this.
Also, this patch removes a couple of JSDoc comments that we generally don't use.
- For text fields
* when printing, we generate a fake font which contains some widths computed thanks to
an OffscreenCanvas and its method measureText.
In order to avoid to have to layout the glyphs ourselves, we just render all of them
in one call in the showText method in using the system sans-serif/monospace fonts.
* when saving, we continue to create the appearance streams if the fonts contain the char
but when a char is missing, we just set, in the AcroForm dict, the flag /NeedAppearances
to true and remove the appearance stream. This way, we let the different readers handle
the rendering of the strings.
- For FreeText annotations
* when printing, we use the same trick as for text fields.
* there is no need to save an appearance since Acrobat is able to infer one from the
Content entry.
These variables are left-over from the initial implementation, back when `String.prototype.padStart` didn't exist and we thus had to pad manually (using a loop).
This helps improve performance for some PDF documents with a huge number of inline images, e.g. the PDF document from issue 2618.
Given that we no longer create `Stream`-instances unconditionally, we also don't need `Dict`-instances for cached inline images (since we only access the filter).
*Please note:* This only fixes the "wrong letter" part of bug 1799927.
It appears that the simple `computeAdler32` function, used when caching inline images, generates hash collisions for some (very short) TypedArrays. In this case that leads to some of the "letters", which are actually inline images, being rendered incorrectly.
Rather than switching to another hashing algorithm, e.g. the `MurmurHash3_64` class, we simply cache using a stringified version of the inline image data as the cacheKey to prevent any future collisions. While this will (naturally) lead to slightly higher peak memory usage, it'll however be limited to the current `Parser`-instance which means that it's not persistent.
One small benefit of these changes is that we can avoid creating lots of `Stream`-instances for already cached inline images.
The purpose of this patch is twofold:
- Initialize the unicode-category data *lazily* during text-extraction, since this is completely unused during general parsing/rendering.
- Stop exposing this data in the API, since it's unused on the main-thread and it seems like it was *accidentally* included.
Obviously these changes are API-observable, but hopefully no user is depending on this. Furthermore, it's trivial for a user to re-create this unicode-category data manually with a regular expression (from the exposed `unicode` property).
Currently, during text-extraction, we're repeatedly normalizing and (when necessary) reversing the unicode-values every time. This seems a little unnecessary, since the result won't change, hence this patch moves that into the `Glyph`-instance and makes it *lazily* initialized.
Taking the `tracemonkey.pdf` document as an example: When extracting the text-content there's a total of 69236 characters but only 595 unique `Glyph`-instances, which mean a 99.1 percent cache hit-rate. Generally speaking, the longer a PDF document is the more beneficial this should be.
*Please note:* The old code is fast enough that it unfortunately seems difficult to measure a (clear) performance improvement with this patch, so I completely understand if it's deemed an unnecessary change.
In order to support opening certain corrupt PDF documents, particularly hand-edited ones, this patch adds support for letting the `Catalog.getAllPageDicts` method fallback to returning an *empty* dictionary to replace (only) the first /Page of the document.
Given that the viewer cannot initialize/load without access to the first page, this will thus allow e.g. document-level scripting to run as expected. Note that by effectively replacing a corrupt or missing first /Page in this way[1], we'll now render nothing but a *blank* page for certain cases of broken/corrupt PDF documents which may look weird.
*Please note:* This functionality is controlled via the existing `stopAtErrors` option, that can be passed to `getDocument`, since it's easy to imagine use-cases where this sort of fallback behaviour isn't desirable.
---
[1] Currently we still require that a /Pages-dictionary is found though, however it *may* be possible to relax even that assumption if that becomes absolutely necessary in future corrupt documents.
Note that the "trailer"-case is already a fallback, since normally we're able to use the "xref"-operator even in corrupt documents. However, when a "trailer"-operator is found we still expect "startxref" to exist and be usable in order to advance the stream position. When that's not the case, as happens in the referenced issue, we use a simple fallback to find the first "obj" occurrence instead.
This *partially* fixes issue 15590, since without this patch we fail to find any objects at all during `XRef.indexObjects`. However, note that the PDF document is still corrupt and won't render since there's no actual /Pages-dictionary and the /Root-entry simply points to the /OpenAction-dictionary instead.
After the clean-up in PR 15616, the `PdfManager.onLoadedStream` method now only has a single call-site.
Hence why this patch suggests that we remove this method and replace it with an *optional* parameter in `PdfManager.requestLoadedStream` instead. By making the new behaviour opt-in, we'll thus not change any existing call-site.
- Some events, which require a user interaction, will allow those functions to be called.
But after few seconds, if there are no more user interaction, it won't be possible
anymore.
The idea is to give an opportunity to the user to leave the pdf.
- Disable print function when we're printing, the same with saving and disallow to save
on open events.
When a form isn't changed, we used the appearances we had in the file, but when
/NeedAppearances is true, all the appearances have to be regenerated whatever they're.
- In #15373, we implemented copy/paste actions in using the system
clipboard.
For any reasons, on Windows, the clipboard doesn't contain the expected
data when the tests are ran in parallel, hence the tests which are
using the clipboard need to be ran sequentially.
- Make sure that we can paste after having copied.
*This is very old code, and it could thus do with some simplification.*
Note how in the `src/core/worker.js` file we're combining both the `PdfManager.requestLoadedStream` and `PdfManager.onLoadedStream` methods in order to access the stream-data. This seems unnecessary, and it's simple enough to always let the `PdfManager.requestLoadedStream` method return the stream-data as well.
PR 13725 was only intended as a temporary work-around, and it seems that we can now revert that.
- Firefox 102 is the currently maintained ESR-branch, and the PDF.js project only supports the active one.
- Node.js now works, thanks to the `node-canvas` package, and I've confirmed locally that following the STR in issue 13724 generates a correct image.
In the referenced PDF document there are "numbers" which consist only of `-.`, and while that's obviously not valid Adobe Reader seems to handle it just fine.
Letting this method ignore more invalid "numbers" was suggested during the review of PR 14543, so let's simply relax our the validation here.
It appears that PR 15593 broke `issue12402`, and we thus need to partially restore the /Count check.
I completely missed this when looking at the test-results for PR 15593, both locally and on the bots, since the `Driver._getLastPageNumber` method would "swallow" an unavailable page number.
- When we're editing some annotations, keeping the role="text-box" make them visible
as editable and VoiceOver (Mac) is able to read the contents when they're focused;
- Add an attribute "aria-activedescendant" in order to make the content discoverable
by NVDA on Windows.
After PR 14311, and follow-up patches, we no longer require that the /Count entry (in the /Pages dictionary) is either present or even valid in order to parse/render a PDF document.
Hence it seems strange to keep this requirement for *corrupt* PDF documents, when trying to find a usable `trailer` in the `XRef.indexObjects` method.
With the changes in the previous patch we can move the glyph-cache lookup to the top of the method and thus avoid a bunch of, in *almost* every case, completely unnecessary re-parsing for every `charCode`.
This method, and its class, was originally added in PR 4453 to reduce memory usage when parsing text. Then PR 13494 extended the `Glyph`-representation slightly to also include the `charCode`, which made the `matchesForCache` method *effectively* redundant since most properties on a `Glyph`-instance indirectly depends on that one. The only exception is potentially `isSpace` in multi-byte strings.
Also, something that I noticed when testing this code: The `matchesForCache` method never worked correctly for `Glyph`s containing `accent`-data, since Objects are passed by reference in JavaScript. For affected fonts, of which there's only a handful of examples in our test-suite, we'd fail to find an already existing `Glyph` because of this.
When we fail to find a usable PDF document `trailer` *and* there were errors during parsing, try and fallback to a *previous* generation as a last resort during fetching of uncompressed references.
*Please note:* This will not affect "normal" PDF documents, with valid /XRef data, and even most *corrupt* documents should be completely unaffected by these changes.
Given that the new sidebar icon is slightly shorter than the old one, it cannot hurt to ever so slightly tweak the vertical position of the notification icon.
(While the patch also changes the CSS rule used for the horizontal position, this is a no-op and was done to improve consistency between the two values.)
Part of this is very old code, and back when support for parsing the catalog-version was added things became less clear (in my opinion).
Hence this patch tries to improve things, by e.g. validating the header- and catalog-version separately.
Note how we're currently skipping all main-thread cleanup when document destruction has started, but for some reason we're still dispatching the "Cleanup" message.
This seems like a simple oversight, since destruction will already invoke the `BasePdfManager.cleanup` method (on the worker-thread) to fully clear-out all caches.
Given the sheer number of heuristics added to this method over the years, moving the *valid* unicode found case to the top should improve readability of the code.
- Fix Field::getArray in order to collect only the fields which have a value;
- Fix AFSimple_Calculate:
* allow to have a string with a list of field names as argument;
* since a field can be non-terminal, use Field::getArray to collect
the field under it and then apply the calculation on all the descendants.
This code was added all the way back in PR 6698, almost seven years ago, for backwards compatibility reasons. At this point in time, it seems that we can remove that since:
- We have more fine-grained "UnsupportedFeature" reporting elsewhere in the worker-thread code nowadays.
- The GetOperatorList-handling is now using `ReadableStream`s, which means that errors are being forwarded to the main-thread anyway.
- We're also no longer displaying a notification-bar, in the *built-in* Firefox PDF Viewer, for any of these "UnsupportedFeature" messages.
*Please note:* I don't really know what I'm doing here, however the patch appears to fix the referenced issue when comparing the rendering with Adobe Reader (with the caveat that I don't speak the language in question).
When a new PDF document is opened in the GENERIC viewer we (obviously) create a new `AnnotationEditorUIManager`-instance, since those are document-specific, and thus we need to ensure that we actually register the `editorTypes` for each one.
Note how after having found the "%PDF-" prefix we then read both the prefix and the version in the loop, only to then remove the prefix at the end.
It seems better to instead advance the stream position past the "%PDF-" prefix, and then read only the version data.
Finally the loop-condition can also be simplified slightly, to further clean-up some very old code.
*Fixes a regression from PR 15246, sorry about that!*
The return value of all `Annotation.getOperatorList` methods was changed in PR 15246, however I missed updating the error code-path in `Page.getOperatorList` which thus breaks all operatorList-parsing for pages with corrupt Annotations.
Looking at the code on the worker-thread, there doesn't appear to be any particular reason for placing *some* of the properties in a `source`-object when sending them with "GetDocRequest".
As is often the case the explanation for this structure is rather "for historical reasons", since originally we simply sent the `source`-object as-is. Doing that was obviously a bad idea, for a couple of reasons:
- It makes it less clear what is/isn't actually needed on the worker-thread.
- Sending unused properties will unnecessarily increase memory usage.
- The `source`-object may contain unclonable data, which would break the library.
Rather than sending all of these parameters individually and then grouping them together on the worker-thread, we can simply handle that in the API instead.
All of the these constants have been deprecated for a while, and with the upcoming *major* version this seems like a good time to remove them.
For the string-constants we can simply remove them, but the number-constants are left commented out since we don't want to re-number the list to prevent third-party breakage.
The way that we set the width of the `dropdownToolbarButton`-select is very old, and despite some improvements over the years this is still somewhat hacky.
In particular, note how we're assigning the select-element a larger width than its containing `dropdownToolbarButton`-element. This was done to prevent displaying *two* separate icons, i.e. the native and the PDF.js one, since it's the only way to handle this in older browsers (particularly Internet Explorer).
Given the currently supported browsers, there's however a better solution available: use `appearance: none;` to disable native styling of the select-element. [According to MDN](https://developer.mozilla.org/en-US/docs/Web/CSS/appearance#browser_compatibility), this is supported in all reasonably modern browsers.
This way we're able to simplify both the CSS rules and the JS-code that's used to adjust the `dropdownToolbarButton` width in a localization aware way.
Because of https://bugzilla.mozilla.org/show_bug.cgi?id=1582545, the padding-inline is by default 0.
0 is not really enough because of the outline, so just set it to 2px (it was 4px before the patch)
in order to have something visually correct.
I noticed the 256 % 3 (which is equal to 1) so I slighty simplify the code.
The sum of the 16 Uint8 doesn't exceed 2^12, hence we can just take the
sum modulo 3.
This method was originally added in PR 1157 (back in 2012), however its only call-site was then removed in PR 2423 (also in 2012).
Hence this method has been completely unused for nearly a decade, and it should thus be safe to remove it.
This patch first of all makes `isOffscreenCanvasSupported` configurable, defaulting to `true` in browsers and `false` in Node.js environments, with a new `getDocument` parameter. While you normally want to use this, in order to improve performance, it should still be possible for users to control it (similar to e.g. `isEvalSupported`).
The specific problem, as reported in issue 14952, is that the SVG back-end doesn't support the new ImageMask data-format that's introduced in PR 14754. In particular:
- When the SVG back-end is used in Node.js environments, this patch will "just work" without the user needing to make any code changes.
- If the SVG back-end is used in browsers, this patch will require that `isOffscreenCanvasSupported: false` is added to the `getDocument`-call.
While it can't hurt to localize the main error-messages, also localizing the error *details* has always seemed somewhat unnecessary since those are only intended for debugging/development purposes. However, I can understand why that's done since the GENERIC viewer used to expose this information in the UI; via the `errorWrapper` UI that's removed in PR 15533.
At this point, when any errors are simply logged in the console, it no longer seems necessary to keep localizing the error *details* in the default viewer.
*Please note:* The referenced issue is the only mention that I can find, in either GitHub or Bugzilla, of "GoToE" actions.
Hence why I've purposely settled for a very simple, and partial, "GoToE" implementation to avoid complicating things initially.[1] In particular, this patch only supports "GoToE" actions that references the /EmbeddedFiles-dict in the PDF document.
See https://web.archive.org/web/20220309040754if_/https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf#G11.2048909
---
[1] Usually I always prefer having *real-world* test-cases to work with, whenever I'm implementing new features.
This is yet another small piece of clean-up of the `FontLoader`-code, since we've not used this `id`-property for anything ever since PR 6571 (which landed almost seven years ago). Furthermore, by default we're also not even using that code-path now since the Font Loading API will always be used when available.
*Please note:* This is tagged `[api-minor]` since it's technically observable from the outside, however no user ought to be directly interacting with these CSS font rules.
This commit fixes the "Expected null to equal '401R'" errors that
surfaced after the Puppeteer 18 upgrade. Note that even before that
this would have been an improvement because it takes some time between
scripting being reported ready (i.e., triggering the execution of any
OpenActions) and those OpenActions actually completing execution, so
it's only safe to check which element is focused if we know an element
actually became focused.
In the Firefox PDF Viewer this has never been used, with the error message simply printed in the web-console, and (somewhat) recently we've also updated the viewer code to avoid bundling the relevant code there. Furthermore, in the Firefox PDF Viewer we're not even display the *browser* fallback bar any more; see https://bugzilla.mozilla.org/show_bug.cgi?id=1705327.
Hence it seems slightly strange to keep this UI around in the GENERIC viewer, and this patch proposes that we simply remove it to simplify/unify the relevant code in the viewer. In particular this also allows us to remove a couple of l10n-strings, which have always been unused in the Firefox PDF Viewer.
Currently the compatibility-file is loaded using a standard `import`-statement and while its code is enclosed in a pre-processor block, and thus is excluded in e.g. the MOZCENTRAL build-target, it still results in the *built* `pdf.js`/`pdf.worker.js` files having an effectively empty closure as a result.
By moving the checks from `src/shared/compatibility.js` and into `src/shared/util.js` instead, we can load the file using a build-time `require`-statement and thus avoid that closure.
Note that with these changes the compatibility-file will no longer be loaded in development mode, i.e. when `gulp server` is used. However, this shouldn't be a big issue given that none of its included polyfills could be loaded then anyway (since `require`-statements are being used) and that it's really only intended for the `legacy`-builds of the library.
Rather than "manually" looking up the l10n-string and then updating the button, we can (and probably even should) just update the l10n-id and then trigger proper translation for the button DOM-element.
This extends the approach used in PresentationMode to also cover the AnnotationEditor, and tries to handle the combination of both cases correctly.
In order to simplify the overall implementation we simply track the *first* seen "previous" cursorTool, and don't allow it to be reset as long as either PresentationMode or an AnnotationEditor is being used.
Note that this PR only adds the "underscore"-variant of *actually existing* ligatures, however the referenced PDF document also uses a couple of non-standard ones (e.g. `ft`, `Th`, and `fh`) that we cannot easily support without larger changes (since they don't have official Unicode-entries).
Given that it's clearly the PDF document, and its fonts, that's the culprit here it's not entirely clear to me that we actually want to attempt a larger refactoring/rewriting of the `glyphlist.js` code, assuming it's even generally possible. Especially when this patch alone already improves our copy-paste behaviour when compared to both Adobe Reader and PDFium, and that this is only the *second* time this sort of bug has been reported.
Fewer dependencies shouldn't be a bad idea in general, and given that the `node-canvas` package already include a `DOMMatrix` polyfill we can simply use that one instead.
Given that Firefox supports *synchronous* font loading, when the Font Loading API isn't being used, there's really no point including code which if called would just throw in the MOZCENTRAL build. (This is safe, since the `FontLoader.isSyncFontLoadingSupported`-getter always return `true` there.)
After the changes in PR 10539 (which landed over three years ago) the `FontLoader.bind` method can only be called with *a single* font at a time, hence the `_prepareFontLoadEvent` method obviously don't need to support multiple fonts any more.
By having just *one* class, and using pre-processor blocks directly in the relevant methods, we reduce the size of this code in the *built* `pdf.js` file.
Originally, when the `BaseFontLoader` abstraction was added in PR 9982, the idea was probably that additional build-targets would get their own implementations. Given that this hasn't happened in the four years since that landed, it doesn't seem meaningful to keep it around.
The existing `loadingContext` class-property can be simplified slightly, since we've not been using the `id`-property on the requests ever since PR 3477 (which landed nine years ago).
Furthermore, by default we're also not even using that code-path now since the Font Loading API will always be used when available.
Currently the `viewBookmark`-button, which is actually a `href`-element, gets an inconsistent `outline`.
Similarly, the `dialog`-buttons also have an inconsistent `outline` after the changes in PR 15438.
Finally, simplifies a couple of `border` rules since setting a border-width when "none" is being used doesn't seem meaningful.
This was done all the way back in PR 8361, for a mozilla-central test that's since been removed. As can be seen in the following search results, there's no `LoopbackPort` invocation outside of the PDF.js code itself: https://searchfox.org/mozilla-central/search?q=LoopbackPort&path=
Given that the `LoopbackPort` is only used in connection with "fake workers", which is something that we don't officially recommend/support, this doesn't seem like functionality that we want to keep exposing in the public API.
The changes in PR 15438 added a `border-radius` when input-elements are focused, however there's no radius when the same elements are hovered. Having the radius change, and not just the `border-color`, when input goes from hovered to focused feels a bit inconsistent (at least to me).
OperatorList.addOp can trigger a flush if it's required, hence the values passed to it must
be correctly initialized in order to avoid some wrong values in the renderer.
Because of that a clip path was considered as empty, nothing was clipped, hence the wrong
rendering in bug 1791583.
*This effectively replaces PR 15465.*
As outlined in https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Map/forEach, the argument order when iterating through a `Map` is actually `value, key`.
Ignoring the incorrect Array used in the old code, I cannot imagine that this would've worked anyway since we didn't use the actual `setTimeout`-functionRefs to clear the timeouts; please refer to the `setTimeout`/`setInterval` methods in the `SandboxSupportBase.createSandboxExternals` method.
Since there are no script engine with XFA, the FormCalc parser is not used irl.
The bug @nmtigor noticed was hidden by another one (the wrong check on `match`).
Some z-index have been added in the annotation layer because the elements inside are re-ordered
in order to improve accessibility.
Hence we must add a "high" z-index on the annotation editor layer in order to avoid any bad
interaction between the different layers.
Most of the `String.prototype.search` call-sites found throughout the code-base is actually not necessary, since we usually only want a *boolean*, and those can be replaced with `RegExp.prototype.test` instead.
The default outline for a focused text input is not that bad but for any reason when changing
the background color, all the good default border/outline properties are lost (it's the same
behaviour in Edge).
So in order have something consistent in HCM/non-HCM, a 2px-border+1px-outline (on @MReschenberg
advices) is added when an input is focused with different colors depending on HCM.
While working on the above issue, I noticed few bugs I fixed when in HCM:
- input, button and select have some default properties which have been created at a time where
annotation layer didn't exist, hence this patch remove them and set those properties where
they should live;
- some elements (like the main toolbar) is using a box-shadow which is invisible in HCM, hence
it's replaced by a border-bottom in HCM;
- some separators are invisible in HCM, hence use GrayText color to render them correctly;
- the options for the zoom selection were invisible in HCM with Desert (one of the Windows 11
themes).
By force-quitting the browser while the FullScreen API is active, we don't get a chance to exit PresentationMode *cleanly* and some of its state thus remains (via the `ViewHistory`).
To try and improve things here we can skip updating the Scroll/Spread-mode while PresentationMode is active, since they will be changed when entering PresentationMode, which seems to help and is really the best that we can do here (and what the issue describes is very much an edge-case anyway).
In the `legacy`-builds we (obviously) support the currently maintained Firefox ESR version, and looking at the [release history](https://wiki.mozilla.org/Release_Management/Calendar) those are officially supported (by Mozilla) for about 1-1.5 years.
However, for non-Firefox browsers the `legacy`-builds currently attempt to "support" browsers that are approximately *three* years old.[1] Historically, in the PDF.js project, trying to support old browsers have caused some maintenance problems and even delayed adoption of new web-platform features/functionality.
To lessen the support burden, given that the primary purpose of the PDF.js library is still to develop the *built-in* Firefox PDF Viewer, this patch proposes that the upcoming *major* release changes the minimum supported browsers/environments as follows:
- Chrome 85, which was released on 2020-08-25; see https://en.wikipedia.org/wiki/Google_Chrome_version_history
- Firefox ESR (as before); see https://wiki.mozilla.org/Release_Management/Calendar
- Safari 14, which was released on 2020-09-16; see https://en.wikipedia.org/wiki/Safari_version_history#Safari_14
- Node.js 14 (as before), which is now explicitly listed to prevent it from accidentally breaking; see https://en.wikipedia.org/wiki/Node.js#Releases
---
[1] In older browsers some functionality may not be available and generally we'll ask users to update to a modern browser when bugs, specific to old browsers, are being reported.
The [official Chrome extension](https://chrome.google.com/webstore/detail/pdf-viewer/oemmndcbldboiebfnladdacbdfmadadm) has unfortunately not been updated for *three years*, which means that it's currently missing out on years worth of bug fixes, performance improvements, and new features.
In particular, the Chrome extension suffers from a known bug with non-embedded standard fonts; see issue 13669 for details.
For the time being, this patch proposes that we *temporary* make the following changes:
- Remove the mention of the official Chrome extension from the main README, since it seems unfortunate to somewhat prominently recommend users an old and partially non-working extension.
- Don't run the `gulp lint-chromium` task as part of the CI, since in addition to the official extension not having been updated its code is also not being actively maintained.[1]
Once the official Chrome extension has been updated, and it's being actively maintained again, this patch should be simple enough to revert.
---
[1] The last commits, which aren't e.g. linting or general code-maintenance related, happened a year ago now.
There's no point in having this variable defined (implicitly) as `undefined` in e.g. the Firefox PDF Viewer.
By defining it with `var` and using an ESLint ignore, rather than `let`, we can move it into the relevant pre-processor block instead. Note that since the entire viewer-code is placed, by Webpack, in a top-level closure this variable will thus not become globally accessible.
After the changes in PR 15391 one separator may now become visible too soon when the viewer is narrow, applies e.g. to the MOZCENTRAL viewer, since the wrong CSS class is being used.
The reason that this happens is that only the GENERIC viewer includes the "openFile"-buttons, and we thus need the separator to also be conditionally defined.
This is a slightly speculative change, based on something that I happened to notice while browsing MDN, to hopefully prevent PDF.js from outright breaking in older browsers.
According to the following information on MDN, Safari didn't implement support for the necessary features until version 14:
- https://developer.mozilla.org/en-US/docs/Web/API/MediaQueryList#browser_compatibility
- https://developer.mozilla.org/en-US/docs/Web/API/MediaQueryList/change_event#browser_compatibility
Given the browsers that we currently support only older versions of Safari should be affected, hence it seems reasonable to simply disable the functionality rather than trying to polyfill it.
(It's interesting how it's very often Safari which is *much* slower than the other browsers at implementing new features.)
After the changes in PR 14112 the `PDFViewer`-class is now "identical" to the `BaseViewer`-class and the `PDFSinglePageViewer`-class is just a very thin wrapper around the `BaseViewer`-class.
Hence we can rename these files, and also remove the abstract `BaseViewer`-class, which helps reduce some unnecessary "closures" in the *built* viewer.
*Please note:* These changes are made in two separate commits, to allow GitHub to preserve `blame` for the affected files.
After the changes in PR 14112 the `PDFViewer`-class is now "identical" to the `BaseViewer`-class and the `PDFSinglePageViewer`-class is just a very thin wrapper around the `BaseViewer`-class.
Hence we can rename these files, and also remove the abstract `BaseViewer`-class, which helps reduce some unnecessary "closures" in the *built* viewer.
*Please note:* These changes are made in two separate commits, to allow GitHub to preserve `blame` for the affected files.
This patch updates a bunch of older code, that makes conditional function calls, to use optional chaining rather than `if`-blocks.
These mostly mechanical changes reduce the size of the `gulp mozcentral` build by a little over 1 kB.
*Please note:* This is only a, hopefully generally helpful, work-around rather than a proper solution to issue 15292.
There's something that's "special" about the Type1 fonts in the referenced PDF document, since we don't manage to find any actual font programs and thus cannot render anything.
Given that it shouldn't make sense for a Type1 font program to ever be empty, since that means that there's no glyph-data to render, we simply fallback to a standard font to at least try and render *something* in these rare cases.
Given that the change in PR 13393 was slightly speculative, given the lack of test-cases, let's just revert part of that to fix the referenced issue.
Based on a quick look at old issues and existing test-cases, it seems that most (if not all) PDF documents that benefit from using the font-data in this way lack any /ToUnicode maps which should mean that they're unaffected by these changes.
Given that the official Bower website, since almost five years, has been advising users to utilize other tools it doesn't seem entirely necessary to keep including the `bower.json` file in the `pdfjs-dist` repository; see e.g. https://bower.io/blog/2017/how-to-migrate-away-from-bower/
This patch proposes removing the `browserify` example for the following reasons:
- The last `browserify` release was almost two years ago, according to both https://github.com/browserify/browserify/releases and https://www.npmjs.com/package/browserify?activeTab=versions
- The project no longer seems to be actively maintained, since so far this year there's only been *a single* (seemingly trivial) patch merged; see https://github.com/browserify/browserify/commits/master
- Because of the previous points `browserify` doesn't support modern and up-to-date JavaScript features, as evident from e.g. issue 14731 and multiple issues found in https://github.com/browserify/browserify/issues
- Our `browserify` example is most likely not very commonly used, judging by the very low volume of issues/PRs related to it. Looking at the `git` history of that example the only changes have been lint- or maintenance-related.[1]
- Providing an example for a framework that's no longer actively maintained doesn't seem like a good idea in general, since we probably don't want to steer users towards using (possibly) older frameworks.
- Given that we've never used `browserify` in the PDF.js project, it's also quite difficult to provide support for the example.
---
[1] It's interesting to compare with the `webpack` example, since that's generated both issues *and* also PRs (for missing features) from users.
Note that this patch implements the `SetOCGState`-handling in `PDFLinkService`, rather than as a new method in `OptionalContentConfig`[1], since this action is nothing but a series of `setVisibility`-calls and that it seems quite uncommon in real-world PDF documents.
The new functionality also required some tweaks in the `PDFLayerViewer`, to ensure that the `layersView` in the sidebar is updated correctly when the optional-content visibility changes from "outside" of `PDFLayerViewer`.
---
[1] We can obviously move this code into `OptionalContentConfig` instead, if deemed necessary, but for an initial implementation I figured that doing it this way might be acceptable.
It slightly helps to reduce the code size and its complexity.
But the cool thing is that it allows to copy/paste some anntations from a pdf
to an other.
Apparently this is implemented in e.g. Adobe Reader, and the specification does support it, however it cannot be commonly used in real-world PDF documents since it took over ten years for this feature to be requested.
A number of Annotation-types are currently creating their own PopupAnnotations, since they need to use a custom `trigger`-element. However, because of where that check is currently implemented[1] we end up attaching empty/unused containers for those PopupAnnotations to the DOM[2]; see e.g. the `annotation-line.pdf` file in the test-suite for one example.
By instead moving the types-check into the `PopupAnnotationElement` constructor, we can completely skip those PopupAnnotations that are being explicitly handled elsewhere.
Note that I don't *believe* that this is a new issue, although I've not tried to bisect it, but this likely goes back quite some time (possibly even as far as PR 8228).
---
[1] In the `PopupAnnotationElement.render` method.
[2] Please note that the actual Popup-element *itself* isn't being attached/rendered here, just its container which by itself serves no purpose as far as I can tell.
There's three notable exceptions here:
- The `saveDocument` one is converted into a permanent `warn`, since it still works when the `annotationStorage` is empty although it's (obviously) less efficient than `getData`.
- The `fallbackWorkerSrc` functionality (for browsers), since just removing it would risk too much third-party breakage.
- The SVG back-end, since a final decision is yet to be made. (It might be completely removed, or left as-is in an essentially "frozen" state.)
Note that this patch prepends the document title with "* ", rather than only "*" as suggested in the bug, since there's nothing that says that a PDF document cannot specify a title[1] beginning with an asterisk. To reduce possible confusion, having a space between the "editing marker" and the actual document title thus cannot hurt as far as I'm concerned.
In order to notify the viewer when all `AnnotationEditor`s have been removed, we utilize the existing `onAnnotationEditor`-callback to allow the document title to be updated as necessary.
Finally, this patch makes the following (slightly unrelated) changes:
- Rename the `AnnotationStorage.removeKey` method to just `AnnotationStorage.remove` instead. This is consistent with e.g. the `has`-method and should suffice to explain what it does.
- Remove the `AnnotationStorage.hasAnnotationEditors` getter, since the viewer now tracks the necessary state internally. This avoids unnecessarily having to iterate through the `AnnotationStorage`-instance when saving/printing the document.
---
[1] Using either an /Info dictionary or a /Metadata stream.
This functionality has never been used anywhere in the PDF.js library/viewer itself, since it was added in 2013.
Furthermore this functionality is, and has always been, *completely untested* and also unmaintained.
Finally, there's (at least) one old issue about `appendImage` not returning the correct position; see issue 4182.
All-in-all, it seems that keeping very old, untested, unmaintained, and partially broken code around probably isn't what we want here.
(On the off-chance that any future a11y-work requires getting access to image-positions, it'd likely be much better to re-implement the necessary functionality from scratch and also make sure that it's properly tested from the beginning.)
This old method, which is only used with the `imageLayer` functionality, is essentially just a re-implementation of the existing `Util.applyTransform` method.
The password dialog can be cancelled in three different ways:
- By clicking on its "Cancel"-button.
- By pressing the Escape-key.
- By force-opening another dialog, although this shouldn't happen in practice.
Here the "Cancel"-button case is slightly special since it'll trigger `PasswordPrompt.#cancel` *twice*, first directly via the click and secondly via the "close" event on the `dialog`-element.
While this shouldn't, as far as I know, cause any bugs it's nonetheless inconsistent with the other cases outlined above. To improve this we can simply attempt to *close* the password dialog instead, and then rely on the "close" event to run the `PasswordPrompt.#cancel` method.
Currently we simply use the Babel `preset-env` in the `legacy`-builds of the PDF.js library. This has the side-effect of transpiling the code for *very old* browsers/environments, including ones that we (since many years) no longer support which unnecessarily bloats the size of the `legacy`-builds.
For the CSS files we're only targeting *the supported browsers*, and it's thus possible to extend that to also apply to Babel.
One of the most significant changes, with this patch, is that we'll no longer polyfill `async`/`await` in the `legacy`-builds. However, this shouldn't be an issue given the browsers that we currently support in PDF.js; please refer to:
- https://github.com/mozilla/pdf.js/wiki/Frequently-Asked-Questions#faq-support
- https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Statements/async_function#browser_compatibility
Currently in `disableWorker=true` mode it's possible that opening of password-protected PDF documents outright fails, if an *incorrect* password is entered. Apparently the event ordering is subtly different in the non-Worker case, which causes the password-callback to be updated *before* the dialog has been fully closed.
To avoid that we'll utilize a `PromiseCapability` to keep track of the state of the password dialog, such that we can delay both re-opening and (importantly) updating of the password-callback until doing so is safe.
This patch *may* also fix issue 15330, but it's impossible for me to tell.
*This is a follow-up to PR 14869.*
In the old code we're accidentally "swallowing" part of the event-details, which explains why the annotationLayer didn't render.
One thing that made debugging a lot harder was the lack of error messages, from the viewer, and a few `PDFPageView`-methods were updated to improve this situation.
- Remove the `typeof Worker` check, since all browsers have had `Worker` support for many years now; see https://developer.mozilla.org/en-US/docs/Web/API/Worker#browser_compatibility
Furthermore the `new Worker(...)` call is wrapped in try-catch, which means that we'll still fallback to "fake workers" if necessary.
- Limit the `fallbackWorkerSrc` handling, in the `PDFWorker.workerSrc` getter, to only GENERIC builds since that's the only place where it's defined anyway.
Given that the code is written with JavaScript module-syntax, none of this functionality will "leak" outside of this file with these change.
By removing this closure the file-size is decreased, even for the *built* `pdf.worker.js` file, since there's now less overall indentation in the code.
This was moved into the `src/display/`-folder in PR 15110, for the initial editor-a11y patch. However, with the changes in PR 15237 we're again only using `binarySearchFirstItem` in the `web/`-folder and it thus seem reasonable to move it back there.
The primary reason for moving it back is that `binarySearchFirstItem` is currently exposed in the public API, and we always want to avoid that unless it's either PDF-related functionality or code that simply must be shared between the `src/`- and `web/`-folders. In this case, `binarySearchFirstItem` is a general helper function that doesn't really satisfy either of those alternatives.
Currently when the `TextAccessibilityManager.enabled` method is called, we'll update `aria-owns` for any pre-existing elements. This obviously makes sense when e.g. zooming/rotating in the viewer, since the annotationLayer/annotationEditorLayer is kept in those cases.
However when the page is *fully* reset, e.g. as result of going out-of-view and thus being evicted from the cache, we still keep the `#textNodes`-Map around. This causes us to set the `aria-owns` attribute (in the textLayer) for an element that doesn't actually exist any more, which as far as I'm concerned seems incorrect. In this case the element will simply, as already implemented, be re-inserted when the annotationLayer/annotationEditorLayer renders again.
Given that the code is written with JavaScript module-syntax, none of this functionality will "leak" outside of this file with these changes.
For e.g. the `gulp mozcentral` command the *built* `pdf.worker.js` file-size decreases `~2 kB` with this patch, and most of the improvement comes from having less overall indentation in the code.
Given that the code is written with JavaScript module-syntax, none of this functionality will "leak" outside of this file with these changes.
By removing this closure the file-size is decreased, even for the *built* `pdf.worker.js` file, since there's now less overall indentation in the code.
This patch doesn't structurally change the text layer: it just adds some aria-owns
attributes to some spans.
The aria-owns attribute expect to have an element id, hence it's why it adds back an
id on the element rendering an annotation, but this id is built in using crypto.randomUUID
to avoid any potential issues with the hash in the url.
The elements in the annotation layer are moved into the DOM in order to have them in the
same "order" as they visually are.
The overall goal is to help screen readers to present to the user the annotations as
they visually are and as they come in the text flow.
It is clearly not perfect, but it should improve readability for some people with visual
disabilities.
According to MDN `Path2D` is available in all browsers that we currently support, see https://developer.mozilla.org/en-US/docs/Web/API/Path2D#browser_compatibility
Hence only Node.js is currently lagging behind here, and requires that we keep the old code as a fallback in the `compileType3Glyph` function. However, there's an open PR in the `node-canvas` repository for adding `Path2D` support.
As far as I'm concerned, there's two possible solutions here:
- We land this patch now, since it removes unnecessary code in e.g. the Firefox PDF Viewer, which means that compilation of Type3 glyphs will be disabled in Node.js until that PR is landed.[1]
If users report bugs about Type3 glyphs looking "inconsistent" in Node.js and/or being slow to render, we could perhaps encourage them to upvote and otherwise help out getting that PR landed?
- We wait for the mentioned PR to land *first*, before moving forward with this patch. Given that there's been no updates on that PR for almost two months, this alternative may possibly take a while.
---
[1] Note that Type3 fonts are first of all not very common in PDF documents, and secondly that compilation only applies specifically to Type3 glyphs that contain /ImageMask-data (i.e. not all Type3 fonts are affected).
This exports the same constants as the viewer components, but in the default viewer. To avoid bloating the global-scope the constants are added to a new `PDFViewerApplicationConstants` object[1], which also allows us to skip this in builds where it's not actually needed (e.g. the Firefox *built-in* PDF Viewer).
*Please note:* I'm not completely sold on this idea, and thus wouldn't mind the patch being rejected, since we probably don't want to export every single viewer constant this way. (And it may seem a bit arbitrary, to users, why some constants are exported and others are not.)
---
[1] Somewhat similar to the existing `PDFViewerApplicationOptions` structure.
In addition to the existing `LinkTarget` constant, used by the `PDFLinkService`-constructor, this patch exports the following constants in the viewer components:
- `ScrollMode` and `SpreadMode`, since the `BaseViewer` has getters/setters which work with those constants.
- `RenderingStates`, since that one may be helpful when using `PDFPageView` directly.
While this has always worked, as a consequence of the implementation, it's never been officially supported.
In addition to adding basic unit-tests, this patch also introduces a couple of new JSDoc `@typedef`s in the API to avoid overly long lines.
By invoking the `reset` methods *last* in the `Toolbar`/`SecondaryToolbar`-constructors, we ensure that the "toolbarreset"/"secondarytoolbarreset"-events are actually handle when the viewer loads. Note that previously those events were dispatched *before* the relevant event-listeners had been attached.
With this small change we can avoid inconsistent initial toolbar-state, specifically in the case when the viewer is *reloaded* (since Firefox keeps the HTML-state on "soft" reloads).
By doing this in the worker-thread this code will only need to run *once*, whereas currently re-rendering of a page forces this to be repeated (e.g. after it's been scrolled out-of-view and then back into view again).
When a FreeText editor is pasted then it hasn't an editorDiv yet when added
to the layer, hence it's empty.
So this patch just move the call to addToAnnotationStorage to ensure we've
what we need.
An annotation doesn't have to be in the text flow, hence it's likely a bad idea
to insert its text in the text layer. But the text must be visible from a screen
reader point of view so it must somewhere in the DOM.
So with this patch, the text from a FreeText annotation is extracted and added in
a div in its HTML counterpart, and with the patch #15237 the text should be visible
and positioned relatively to the text flow.
Given that this image is intended specifically for the default viewer, we simply use the CSS preprocessor to remove the image reference in the `gulp components` build.
Considering that the issue only affects a CSS file, I don't believe that replacing the *just released* PDF.js version is actually necessary here.
It doesn't make sense to use a page-canvas that's *smaller* than the resulting thumbnail, since that causes the image to be upscaled which results in a blurry thumbnail. Note that this doesn't normally happen, unless a very small zoom-level is used in the viewer.
2022-07-31 13:59:56 +02:00
864 changed files with 107495 additions and 87556 deletions
@ -23,9 +23,8 @@ Feel free to stop by our [Matrix room](https://chat.mozilla.org/#/room/#pdfjs:mo
### Online demo
Please note that the "Modern browsers" version assumes native support for
features such as `async`/`await`, optional chaining, nullish coalescing,
and private `class` fields/methods.
Please note that the "Modern browsers" version assumes native support for the
latest JavaScript features; please also see [this wiki page](https://github.com/mozilla/pdf.js/wiki/Frequently-Asked-Questions#faq-support).
+ Modern browsers: https://mozilla.github.io/pdf.js/web/viewer.html
@ -54,14 +53,19 @@ To get a local copy of the current code, clone it using git:
Next, install Node.js via the [official package](https://nodejs.org) or via
[nvm](https://github.com/creationix/nvm). You need to install the gulp package
globally (see also [gulp's getting started](https://github.com/gulpjs/gulp/blob/master/docs/getting-started.md#getting-started)):
globally (see also [gulp's getting started](https://github.com/gulpjs/gulp/tree/master/docs/getting-started)):
$ npm install -g gulp-cli
$ npm install -g gulp-cli@^2.3.0
If you prefer to not install `gulp-cli` globally, you have to prefix all the `gulp` commands with `npx` (for example, `npx gulp server` instead of `gulp server`).
If everything worked out, install all dependencies for PDF.js:
$ npm install
> [!NOTE]
> On MacOS M1/M2 you may see some `node-gyp`-related errors when running `npm install`. This is because one of our dependencies, `"canvas"`, does not provide pre-built binaries for this platform and instead `npm` will try to build it from source. Please make sure to first install the necessary native dependencies using `brew`: https://github.com/Automattic/node-canvas#compiling.
Finally, you need to start a local web server as some browsers do not allow opening
PDF files using a `file://` URL. Run:
@ -71,7 +75,7 @@ and then you can open:
+ http://localhost:8888/web/viewer.html
Please keep in mind that this requires a modern and fully up-to-date browser; refer to [Building PDF.js](https://github.com/mozilla/pdf.js/blob/master/README.md#building-pdfjs) for non-development usage of the PDF.js library.
Please keep in mind that this assumes the latest version of Mozilla Firefox; refer to [Building PDF.js](https://github.com/mozilla/pdf.js/blob/master/README.md#building-pdfjs) for non-development usage of the PDF.js library.
It is also possible to view all test PDF files on the right side by opening:
@ -37,16 +37,18 @@ Before downloading PDF.js please take a moment to understand the different layer
## Download
Please refer to [this wiki page](https://github.com/mozilla/pdf.js/wiki/Frequently-Asked-Questions#faq-support) for information about supported browsers.
<divclass="row">
<divclass="col-md-4">
<h3>Prebuilt</h3>
<h3>Prebuilt (modern browsers)</h3>
<p>
Includes the generic build of PDF.js and the viewer.
"description":"The theme to use.\n0 = Use system theme.\n1 = Light theme.\n2 = Dark theme.",
"type":"integer",
"enum":[
0,
1,
2
],
"enum":[0,1,2],
"default":2
},
"showPreviousViewOnLoad":{
@ -21,13 +17,15 @@
"title":"View position on load",
"description":"The position in the document upon load.\n -1 = Default (uses OpenAction if available, otherwise equal to `viewOnLoad = 0`).\n 0 = The last viewed page/position.\n 1 = The initial page/position.",
"type":"integer",
"enum":[
-1,
0,
1
],
"enum":[-1,0,1],
"default":0
},
"defaultZoomDelay":{
"title":"Default zoom delay",
"description":"Delay (in ms) to wait before redrawing the canvas.",
"type":"integer",
"default":400
},
"defaultZoomValue":{
"title":"Default zoom level",
"description":"Default zoom level of the viewer. Accepted values: 'auto', 'page-actual', 'page-width', 'page-height', 'page-fit', or a zoom level in percents.",
@ -39,13 +37,7 @@
"title":"Sidebar state on load",
"description":"Controls the state of the sidebar upon load.\n -1 = Default (uses PageMode if available, otherwise the last position if available/enabled).\n 0 = Do not show sidebar.\n 1 = Show thumbnails in sidebar.\n 2 = Show document outline in sidebar.\n 3 = Show attachments in sidebar.",
"type":"integer",
"enum":[
-1,
0,
1,
2,
3
],
"enum":[-1,0,1,2,3],
"default":-1
},
"enableHandToolOnLoad":{
@ -53,14 +45,15 @@
"type":"boolean",
"default":false
},
"enableML":{
"type":"boolean",
"default":false
},
"cursorToolOnLoad":{
"title":"Cursor tool on load",
"description":"The cursor tool that is enabled upon load.\n 0 = Text selection tool.\n 1 = Hand tool.",
"type":"integer",
"enum":[
0,
1
],
"enum":[0,1],
"default":0
},
"pdfBugEnabled":{
@ -75,6 +68,22 @@
"description":"Whether to allow execution of active content (JavaScript) by PDF files.",
"description":"Whether to disable range requests (not recommended).",
@ -102,37 +111,18 @@
"type":"boolean",
"default":false
},
"enhanceTextSelection":{
"description":"DEPRECATED. Set textLayerMode to 2 to use the enhanced text selection layer by default.",
"type":"boolean",
"default":false
},
"textLayerMode":{
"title":"Text layer mode",
"description":"Controls if the text layer is enabled, and the selection mode that is used.\n 0 = Disabled.\n 1 = Enabled.\n 2 = (Experimental) Enabled, with enhanced text selection.",
"description":"Controls if the text layer is enabled, and the selection mode that is used.\n 0 = Disabled.\n 1 = Enabled.",
"type":"integer",
"enum":[
0,
1,
2
],
"enum":[0,1],
"default":1
},
"useOnlyCssZoom":{
"type":"boolean",
"default":false
},
"externalLinkTarget":{
"title":"External links target window",
"description":"Controls how external links will be opened.\n 0 = default.\n 1 = replaces current window.\n 2 = new window/tab.\n 3 = parent.\n 4 = in top window.",
"type":"integer",
"enum":[
0,
1,
2,
3,
4
],
"enum":[0,1,2,3,4],
"default":0
},
"disablePageLabels":{
@ -152,17 +142,13 @@
},
"annotationMode":{
"type":"integer",
"enum":[
0,
1,
2,
3
],
"enum":[0,1,2,3],
"default":2
},
"annotationEditorMode":{
"type":"integer",
"default":-1
"enum":[-1,0,3,15],
"default":0
},
"enablePermissions":{
"type":"boolean",
@ -192,25 +178,14 @@
"title":"Scroll mode on load",
"description":"Controls how the viewer scrolls upon load.\n -1 = Default (uses the last position if available/enabled).\n 3 = Page scrolling.\n 0 = Vertical scrolling.\n 1 = Horizontal scrolling.\n 2 = Wrapped scrolling.",
"type":"integer",
"enum":[
-1,
0,
1,
2,
3
],
"enum":[-1,0,1,2,3],
"default":-1
},
"spreadModeOnLoad":{
"title":"Spread mode on load",
"description":"Whether the viewer should join pages into spreads upon load.\n -1 = Default (uses the last position if available/enabled).\n 0 = No spreads.\n 1 = Odd spreads.\n 2 = Even spreads.",
This is a pre-built version of the PDF.js source code. It is automatically
generated by the build scripts.
For usage with older browsers or environments, without support for modern
features such as `async`/`await`, optional chaining, nullish coalescing,
and private `class` fields/methods; please see the `legacy/` folder.
For usage with older browsers/environments, without native support for the
latest JavaScript features, please see the `legacy/` folder.
Please see [this wiki page](https://github.com/mozilla/pdf.js/wiki/Frequently-Asked-Questions#faq-support) for information about supported browsers/environments.
See https://github.com/mozilla/pdf.js for learning and contributing.
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.