Commit Graph

3051 Commits

Author SHA1 Message Date
Jonas Jenwald
58b5eb89b8
Merge pull request #16315 from Snuffleupagus/annotationLayer-CSS-is
Introduce some `:is` usage in the annotationLayer CSS
2023-04-19 15:32:10 +02:00
Jonas Jenwald
5119e7fd6a
Merge pull request #16313 from Snuffleupagus/textLayer-CSS-is
Introduce some `:is` usage in the textLayer CSS
2023-04-19 15:17:22 +02:00
Calixte Denizet
19ca41896e Correctly clip the text in the text layer (fixes #16316) 2023-04-18 17:00:42 +02:00
Jonas Jenwald
fcc535706a Introduce some :is usage in the annotationLayer CSS
While this slightly reduces duplication in the CSS rules, some of the auto-formatting done by Prettier is perhaps not great. (Given the overall advantage of using Prettier, we'll probably have to simply accept this.)
2023-04-18 12:42:13 +02:00
Jonas Jenwald
5cb99321d7 Introduce some :is usage in the textLayer CSS 2023-04-18 11:39:09 +02:00
Calixte Denizet
117bbf7cd9 [api-minor] Don't normalize the text used in the text layer.
Some arabic chars like \ufe94 could be searched in a pdf, hence it must be normalized
when creating the search query. So to avoid to duplicate the normalization code,
everything is moved in the find controller.
The previous code to normalize text was using NFKC but with a hardcoded map, hence it
has been replaced by the use of normalize("NFKC") (it helps to reduce the bundle size
by 30kb).
In playing with this \ufe94 char, I noticed that the bidi algorithm wasn't taking into
account some RTL unicode ranges, the generated font wasn't embedding the mapping this
char and the unicode ranges in the OS/2 table weren't up-to-date.

When normalized some chars can be replaced by several ones and it induced to have
some extra chars in the text layer. To avoid any regression, when copying some text
from the text layer, a copied string is normalized (NFKC) before being put in the
clipboard (it works like this in either Acrobat or Chrome).
2023-04-17 14:31:23 +02:00
calixteman
3e08eee511
Merge pull request #16301 from calixteman/issue16278
[Editor] Take into account the initial rotation (issue #16278)
2023-04-17 09:42:07 +02:00
Calixte Denizet
8e5f4c0622 [Editor] Take into account the initial rotation (issue #16278) 2023-04-16 21:36:26 +02:00
Tim van der Meij
f46ed43b81
Merge pull request #16247 from Snuffleupagus/issue-7442
[api-minor] Add support, in `PDFFindController`, for mixing phrase/word searches (issue 7442)
2023-04-16 14:23:41 +02:00
Calixte Denizet
ca54ea12b3 Add the possibility to copy all the pdf text whatever the rendered pages are (bug 1788035) 2023-04-15 18:59:40 +02:00
Jonas Jenwald
0e19c3a120 [api-minor] Add support, in PDFFindController, for mixing phrase/word searches (issue 7442)
*Please note:* This patch only extends the `PDFFindController` implementation itself to support this functionality, however it's *purposely* not exposed in the default viewer.

This replaces the previous `phraseSearch`-parameter, and a `query`-string will now always be interpreted as a phrase-search.
To enable searching for individual words, the `query`-parameter must instead consist of an Array of strings. This way it's now also possible to combine phrase/word searches, with a `query`-parameter looking something like `["Lorem ipsum", "foo", "bar"]` which will search for the phrase "Lorem ipsum" *and* the words "foo" respectively "bar".
2023-04-15 13:32:37 +02:00
calixteman
7571842d84
Merge pull request #16275 from calixteman/ifx_search_with_fractions
Fix search of numbers inside fractions
2023-04-11 21:52:56 +02:00
Calixte Denizet
d8795f9f8f Fix search of numbers inside fractions 2023-04-11 20:57:26 +02:00
Jonas Jenwald
3a36a9d337
Merge pull request #16268 from Snuffleupagus/RegionalImageCache
Attempt to also cache images at the "page"-level (issue 16263)
2023-04-11 12:06:29 +02:00
calixteman
c1c372c320
Merge pull request #16225 from calixteman/16224
Thin whitespaces must have their own span
2023-04-11 11:13:16 +02:00
Jonas Jenwald
9881dbf927 Attempt to also cache images at the "page"-level (issue 16263)
Currently we have two separate image-caches on the worker-thread:
 - A local one, which is unique to each `PartialEvaluator.getOperatorList` invocation. This one caches both names *and* references, since image-resources may be accessed in either way.
 - A global one, which applies to the entire PDF documents and all its pages. This one only caches references, since nothing else would work.

This patch introduces a third image-cache, which essentially sits "between" the two existing ones. The new `RegionalImageCache`[1] will be usable throughout a `PartialEvaluator` instance, and consequently it *only* caches references, which thus allows us to keep track of repeated image-resources found in e.g. different /Form and /SMask objects.

---
[1] For lack of a better word, since naming things is hard...
2023-04-10 11:34:41 +02:00
Jonas Jenwald
5063a6f2a9 [api-minor] Remove the disableCombineTextItems option
*Please note:* This parameter has never been used within the PDF.js library/viewer itself, and it was only ever added for backwards compatibility reasons.

This parameter was added in PR 7475, over six years ago, to try and optionally maintain the previous *default* text-extraction behaviour.
However as part of the general text-extraction improvements in PR 13257, almost two years ago, the `disableCombineTextItems` functionality was accidentally "broken" in various ways. Note how the only (very basic) unit-test was updated in a way that doesn't really make sense, since generally speaking you'd expect that using the option should result in *more* (or at least the same number of) text-items. Furthermore there's also the recent issue 16209, where the option causes almost all textContent to be concatenated together.

Hence this patch proposes that we simply remove the `disableCombineTextItems` option since it's essentially unused/untested functionality, as evident from the fact that it took almost two years for someone to notice that it's broken.
2023-03-30 14:23:38 +02:00
Calixte Denizet
4b7eb1436d Thin whitespaces must have their own span 2023-03-29 11:23:58 +02:00
Calixte Denizet
a96f10e55d Create a new chunk when the char is too rised compared to the previouse one 2023-03-28 13:56:46 +02:00
Jonas Jenwald
a4dfa04a0b Enable the declaration-block-no-redundant-longhand-properties Stylelint rule
Note that these changes were done automatically, using `gulp lint --fix`.
This rule will help avoid unnecessary repetition in the CSS; please see https://stylelint.io/user-guide/rules/declaration-block-no-redundant-longhand-properties/
2023-03-25 10:08:27 +01:00
Jonas Jenwald
1fc09f0235 Enable the unicorn/prefer-string-replace-all ESLint plugin rule
Note that the `replaceAll` method still requires that a *global* regular expression is used, however by using this method it's immediately obvious when looking at the code that all occurrences will be replaced; please see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/replaceAll#parameters

Please find additional details at https://github.com/sindresorhus/eslint-plugin-unicorn/blob/main/docs/rules/prefer-string-replace-all.md
2023-03-23 12:57:10 +01:00
Jonas Jenwald
5f64621d46 Use String.prototype.replaceAll() where appropriate
This fairly new method allows replacing *multiple* occurrences within a string without having to use regular expressions.

Please refer to:
 - https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/replaceAll
 - https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/replaceAll#browser_compatibility
2023-03-22 15:31:10 +01:00
Jonas Jenwald
915bdd6576
Merge pull request #16173 from Snuffleupagus/inset
Introduce `inset` usage in the CSS files
2023-03-22 12:57:57 +01:00
Jonas Jenwald
137a2d6e30 Add even more non-standard ligatures (PR 15517 follow-up)
Given that we already create multi-byte ToUnicode entries in other cases, see e.g. the `getNormalizedUnicodes` table, this is hopefully fine.
2023-03-22 10:42:52 +01:00
Jonas Jenwald
9321758d91
Merge pull request #16186 from Snuffleupagus/issue-16176
Support multi-byte ToUnicode entries, when using predefined CMaps (issue 16176)
2023-03-21 22:17:18 +01:00
Jonas Jenwald
d4bcfe8c16 Support multi-byte ToUnicode entries, when using predefined CMaps (issue 16176)
Hopefully this makes sense, since we already "create" multi-byte ToUnicode entries in other cases (see e.g. the `getNormalizedUnicodes` table).
2023-03-21 21:35:57 +01:00
calixteman
8bfebf1c24
Merge pull request #16188 from calixteman/bug1823296
Use the position of the previous xref stream if any when saving a pdf (bug 1823296)
2023-03-21 21:21:49 +01:00
Calixte Denizet
2d0f30a67c Use the position of the previous xref stream if any when saving a pdf (bug 1823296) 2023-03-21 19:27:24 +01:00
Jonas Jenwald
c4a725fe98 Fix the transfer parameter, for structuredClone, in the LoopbackPort
The way that we handle the `transfer` parameter is unfortunately wrong, ever since PR 14392 which introduced the code, given that the MDN article originally contained incorrect information; please see https://github.com/mdn/content/pull/23164

By updating the `structuredClone` call such that it works correctly, we can enable more unit-tests in Node.js environments; please refer to https://developer.mozilla.org/en-US/docs/Web/API/structuredClone#parameters
2023-03-19 22:04:01 +01:00
Jonas Jenwald
553c2e05cd Introduce inset usage in the CSS files
The `inset` property is a nice shorthand that can be used to avoid having to specify the positions individually; please see
 - https://developer.mozilla.org/en-US/docs/Web/CSS/inset
 - https://developer.mozilla.org/en-US/docs/Web/CSS/inset-inline
2023-03-19 14:32:37 +01:00
Jonas Jenwald
0e54a3c37a Warn about missing/incorrect --scale-factor CSS-variable in renderTextLayer (issue 16139)
Unfortunately I don't believe that we can simply add a default `--scale-factor` CSS-variable to the `container`-element, since that might not be entirely appropriate/correct in all cases.[1]
However, we can at least print a console-error to hopefully make this situation more apparent to users. (This is purposely not using the `warn` helper-function, since those messages can be disabled.)

---
[1] One example is in our reference-tests, where we don't need to add it to the `container`-element itself.
2023-03-16 11:53:12 +01:00
Jonas Jenwald
fc055dbd80 [api-minor] Extend general transfer function support to browsers without OffscreenCanvas
This patch extends PR 16115 to work in all browsers, regardless of their `OffscreenCanvas` support, such that transfer functions will be applied to general rendering (and not just image data).
In order to do this we introduce the `BaseFilterFactory` that is then extended in browsers/Node.js environments, similar to all the other factories used in the API, such that we always have the necessary factory available in `src/display/canvas.js`.

These changes help simplify the existing `putBinaryImageData` function, and the new method can easily be stubbed-out in the Firefox PDF Viewer.

*Please note:* This patch removes the old *partial* transfer function support, which only applied to image data, from Node.js environments since the `node-canvas` package currently doesn't support filters. However, this should hopefully be fine given that:
 - Transfer functions are not very commonly used in PDF documents.
 - Browsers in general, and Firefox in particular, are the *primary* development target for the PDF.js library.
 - The FAQ only lists Node.js as *mostly* supported, see https://github.com/mozilla/pdf.js/wiki/Frequently-Asked-Questions#faq-support
2023-03-14 13:09:08 +01:00
calixteman
b2a86350fc
Merge pull request #16096 from bungeman/fix_trig_functions
Correct PostScript trigonometric operators
2023-03-11 14:32:23 +01:00
Calixte Denizet
07b094729e Fix search in pdf a containing some UTF-32 characters (bug 1820909)
Some chars were supposed to have a length equals to 1 but UTF-32 chars
can be longuer.
2023-03-09 15:03:01 +01:00
calixteman
a0ef5a4ae1
Merge pull request #16115 from calixteman/issue16114
Apply transfer filters to any graphic commands
2023-03-08 14:53:41 +01:00
Jonas Jenwald
471aef5fc6 Support (rare) Type3 fonts with Pattern resources (issue 16127)
This simply extends the approach in PR 10727 to also cover Patterns, which shouldn't be a common occurrence in Type3 fonts (since this is the first issue we've seen).
2023-03-08 09:20:52 +01:00
Calixte Denizet
8304df2520 Apply transfer filters to any graphic commands 2023-03-07 22:17:19 +01:00
Calixte Denizet
b8dda089e2 Slightly modify the max width of a tracking space 2023-03-07 19:38:49 +01:00
calixteman
ec5288caa5
Merge pull request #16121 from calixteman/issue16120
Use appearance stream to render locked annotations (bug 1723568)
2023-03-07 15:43:50 +01:00
Calixte Denizet
8db77cc361 Use appearance stream to render locked annotations (bug 1723568) 2023-03-07 15:01:31 +01:00
Jonas Jenwald
2f2f1e5088 Revert "Update rimraf to version 4"
This reverts commit 32357e3d17.
2023-03-06 19:57:00 +01:00
Calixte Denizet
3849063d36 [Annotation] Don't rotate an annotation when it has the NoRotate flag 2023-03-06 17:27:11 +01:00
Calixte Denizet
05b0c9d7e6 Render large images even if they're larger than the canvas limits (bug 1720282)
The idea is to encode large image in BMP format (which is very simple and doesn't
require to compute any checksums) and then use createImageBitmap with a BMP blob
(which doesn't suffer of the Canvas/ImageData limits).
From a performance point of view, it isn't crazy (generating a large blob + decoding
it on the main thread is really not ideal) but at least we've something to display
which is a way better than a blank page (and one can notice that most of the time is
spent in decoding the image from the pdf stream).
2023-03-05 14:07:07 +01:00
Ben Wagner
158c836e26 Correct PostScript trigonometric operators
PDF 32000-1:2008 7.10.5.1 "Type 4 (PostScript Calculator) Functions"
defers to the PostScript Language Reference for the description of these
functions. The PostScript Language Reference, third edition chapter 8
"Operators" defines the `angle` type as a "number of degrees". Section
8.1 defines "angle `sin` real", "angle `cos` real", and "num den `atan`
angle". The documentation for `atan` further states that it will return
an angle in degrees between 0 and 360.

Handle these operators correctly in `PostScriptEvaluator.execute`.
Convert the inputs to `sin` and `cos` from degrees to radians for use
with `Math.sin` and `Math.cos`. Correctly pop two values from the stack
for `atan`, use `Math.atan2`, and convert from radians to (positive)
degrees.
2023-03-03 17:25:11 -05:00
Calixte Denizet
fd03cd5493 [api-minor] Generate images in the worker instead of the main thread.
We introduced the use of OffscreenCanvas in #14754 and this patch aims
to use them for all kind of images.
It'll slightly improve performances (and maybe slightly decrease memory use).
Since an image can be rendered in using some transfer maps but because of
OffscreenCanvas we don't have the underlying pixels array the transfer maps
stuff is re-implemented in using the SVG filter feComponentTransfer.
2023-03-01 17:40:12 +01:00
Jonas Jenwald
f42a2e8451
[api-minor] Move the canvasFactory option into getDocument
Rather than repeatedly initializing a `canvasFactory`-instance for every page, move it to the document-level instead.

*Please note:* This patch is written using the GitHub UI, since I'm currently without a dev machine, so hopefully it works correctly.
2023-03-01 09:07:16 +01:00
Calixte Denizet
3a21423386 [Acroform] Use the full path to find the node in the XFA datasets where to store the value
I noticed several 'Path not found' errors because of a field called #subform[2].
From the XFA specs, the hash is used for a class of elements in the template tree.
When we're looking for a node in the datasets tree, it doesn't make sense to search
for a class. Hence the path element starting with a hash are just skipped.
2023-02-23 12:09:39 +01:00
Calixte Denizet
dca54c8f8a [JS] Send a Validate action on change on Choice widget 2023-02-19 16:33:05 +01:00
Tim van der Meij
9fac676796
Merge pull request #16054 from Snuffleupagus/Driver-getDocument-cleanup
A little clean-up of the `getDocument` call in `test/driver.js`
2023-02-19 12:12:07 +01:00
Calixte Denizet
fc7d74385f Don't replace an eol by a whitespace when the last char is a Katakana-Hiragana diacritic 2023-02-16 11:31:58 +01:00