Commit Graph

2761 Commits

Author SHA1 Message Date
Calixte Denizet
117bbf7cd9 [api-minor] Don't normalize the text used in the text layer.
Some arabic chars like \ufe94 could be searched in a pdf, hence it must be normalized
when creating the search query. So to avoid to duplicate the normalization code,
everything is moved in the find controller.
The previous code to normalize text was using NFKC but with a hardcoded map, hence it
has been replaced by the use of normalize("NFKC") (it helps to reduce the bundle size
by 30kb).
In playing with this \ufe94 char, I noticed that the bidi algorithm wasn't taking into
account some RTL unicode ranges, the generated font wasn't embedding the mapping this
char and the unicode ranges in the OS/2 table weren't up-to-date.

When normalized some chars can be replaced by several ones and it induced to have
some extra chars in the text layer. To avoid any regression, when copying some text
from the text layer, a copied string is normalized (NFKC) before being put in the
clipboard (it works like this in either Acrobat or Chrome).
2023-04-17 14:31:23 +02:00
Jonas Jenwald
c79bdd6ae6 Simplify the CFFCompiler.compileTypedArray method
Rather than manually creating the Array, we can use the now existing `Array.from` method instead.
2023-04-15 11:13:34 +02:00
Jonas Jenwald
0ce568e789 Remove CFFCompiler.compileGlobalSubrIndex since it's completely unused
This method was originally added in PR 1320, eleven years ago, however it doesn't appear to ever have been used (not even from the start).
Furthermore, this method also tries to access a property that doesn't exist (`this.out`) and then call a method that also doesn't exist (`writeByteArray`).
2023-04-15 11:13:21 +02:00
Jonas Jenwald
ab2773416b
Merge pull request #16291 from Snuffleupagus/issue-16289
Limit the `Path2D`-checks in the worker-thread to Node.js (PR 16238 follow-up, issue 16289)
2023-04-14 21:26:12 +02:00
Calixte Denizet
5eab8ec610 Avoid when it's possible to use Array.concat when compiling a CFF font
In looking at https://bugs.ghostscript.com/show_bug.cgi?id=706451 I noticed that bug2.pdf was pretty
slow to load for such a basic file.
In profiling I noticed that a lot of time is spent in Array.concat, hence this patch use Array.push when
it's possible (it's now ~3 times faster).
2023-04-14 19:01:01 +02:00
Jonas Jenwald
edd13895dd Limit the Path2D-checks in the worker-thread to Node.js (PR 16238 follow-up, issue 16289)
The changes in PR 16238 were intended specifically for Node.js environments, however they accidentally applied to older browsers as well.

*Please note:* In up-to-date browsers `Path2D` is available in Workers, which should be connected to the introduction of `OffscreenCanvas`.
2023-04-14 11:51:11 +02:00
Jonas Jenwald
3a36a9d337
Merge pull request #16268 from Snuffleupagus/RegionalImageCache
Attempt to also cache images at the "page"-level (issue 16263)
2023-04-11 12:06:29 +02:00
calixteman
c1c372c320
Merge pull request #16225 from calixteman/16224
Thin whitespaces must have their own span
2023-04-11 11:13:16 +02:00
Jonas Jenwald
9881dbf927 Attempt to also cache images at the "page"-level (issue 16263)
Currently we have two separate image-caches on the worker-thread:
 - A local one, which is unique to each `PartialEvaluator.getOperatorList` invocation. This one caches both names *and* references, since image-resources may be accessed in either way.
 - A global one, which applies to the entire PDF documents and all its pages. This one only caches references, since nothing else would work.

This patch introduces a third image-cache, which essentially sits "between" the two existing ones. The new `RegionalImageCache`[1] will be usable throughout a `PartialEvaluator` instance, and consequently it *only* caches references, which thus allows us to keep track of repeated image-resources found in e.g. different /Form and /SMask objects.

---
[1] For lack of a better word, since naming things is hard...
2023-04-10 11:34:41 +02:00
Tim van der Meij
13f2426aab
Merge pull request #16238 from Snuffleupagus/update-Node-compat-check
Update the Node.js compatibility-check in the worker-thread
2023-04-01 14:20:33 +02:00
Jonas Jenwald
57a307d0cd Update the Node.js compatibility-check in the worker-thread
*Please note:* In Node.js environments a `legacy`-build must be used since only those versions include any polyfills.

Previously we'd only check if `ReadableStream` is natively supported, however since Node.js version 18 that's now been implemented; please see https://developer.mozilla.org/en-US/docs/Web/API/ReadableStream#browser_compatibility
Hence we'll also check for the availability of `Path2D`, since that's browser-specific functionality not expected to be available in Node.js environments; please see https://developer.mozilla.org/en-US/docs/Web/API/Path2D#browser_compatibility
2023-03-30 18:36:15 +02:00
Jonas Jenwald
5063a6f2a9 [api-minor] Remove the disableCombineTextItems option
*Please note:* This parameter has never been used within the PDF.js library/viewer itself, and it was only ever added for backwards compatibility reasons.

This parameter was added in PR 7475, over six years ago, to try and optionally maintain the previous *default* text-extraction behaviour.
However as part of the general text-extraction improvements in PR 13257, almost two years ago, the `disableCombineTextItems` functionality was accidentally "broken" in various ways. Note how the only (very basic) unit-test was updated in a way that doesn't really make sense, since generally speaking you'd expect that using the option should result in *more* (or at least the same number of) text-items. Furthermore there's also the recent issue 16209, where the option causes almost all textContent to be concatenated together.

Hence this patch proposes that we simply remove the `disableCombineTextItems` option since it's essentially unused/untested functionality, as evident from the fact that it took almost two years for someone to notice that it's broken.
2023-03-30 14:23:38 +02:00
Calixte Denizet
4b7eb1436d Thin whitespaces must have their own span 2023-03-29 11:23:58 +02:00
calixteman
622465dc20
Merge pull request #16223 from calixteman/16221
Create a new chunk when the char is too rised compared to the previous one
2023-03-28 15:30:14 +02:00
Calixte Denizet
a96f10e55d Create a new chunk when the char is too rised compared to the previouse one 2023-03-28 13:56:46 +02:00
Jonas Jenwald
d584513cb2
Merge pull request #16213 from Snuffleupagus/validateCSSFont-quotes
Reduce duplication in the `validateCSSFont` helper function
2023-03-28 12:40:23 +02:00
Jonas Jenwald
20cbb89412 Simplify the isPDFFunction helper function
Originally we used helper functions for checking if something was a Dictionary or Stream, and then having an initial `typeof` check probably made sense.
However, given that we're using `instanceof` nowadays the additional check longer seems necessary.
2023-03-27 11:34:20 +02:00
Jonas Jenwald
ef70988027 Reduce duplication in the validateCSSFont helper function
Currently we're *virtually* duplicating the same code, for validating quotation marks, twice in this helper function.

The size decrease is quite small (107 bytes) and this makes the code slightly harder to reader, hence I completely understand if this patch is rejected.
2023-03-26 12:12:49 +02:00
Jonas Jenwald
035a273d30 Use replaceAll in the recoverJsURL helper function
We can just do direct replacement when building the regular expression, rather than splitting the string into an Array and then re-joining it.
2023-03-25 12:31:39 +01:00
Jonas Jenwald
96e34fbb7d Enable the unicorn/prefer-negative-index ESLint plugin rule
Please see https://github.com/sindresorhus/eslint-plugin-unicorn/blob/main/docs/rules/prefer-negative-index.md
2023-03-24 10:18:32 +01:00
Jonas Jenwald
1fc09f0235 Enable the unicorn/prefer-string-replace-all ESLint plugin rule
Note that the `replaceAll` method still requires that a *global* regular expression is used, however by using this method it's immediately obvious when looking at the code that all occurrences will be replaced; please see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/replaceAll#parameters

Please find additional details at https://github.com/sindresorhus/eslint-plugin-unicorn/blob/main/docs/rules/prefer-string-replace-all.md
2023-03-23 12:57:10 +01:00
Jonas Jenwald
5f64621d46 Use String.prototype.replaceAll() where appropriate
This fairly new method allows replacing *multiple* occurrences within a string without having to use regular expressions.

Please refer to:
 - https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/replaceAll
 - https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/replaceAll#browser_compatibility
2023-03-22 15:31:10 +01:00
Jonas Jenwald
137a2d6e30 Add even more non-standard ligatures (PR 15517 follow-up)
Given that we already create multi-byte ToUnicode entries in other cases, see e.g. the `getNormalizedUnicodes` table, this is hopefully fine.
2023-03-22 10:42:52 +01:00
Jonas Jenwald
122d5e549a Track previous "XRefStm"s in a Set, rather than an Object
Having just reviewed a patch touching this code, I couldn't help noticing that an `Object` isn't really the optimal data-structure for this and nowadays we can do better by using a `Set` instead.
2023-03-22 09:41:19 +01:00
Jonas Jenwald
9321758d91
Merge pull request #16186 from Snuffleupagus/issue-16176
Support multi-byte ToUnicode entries, when using predefined CMaps (issue 16176)
2023-03-21 22:17:18 +01:00
Jonas Jenwald
d4bcfe8c16 Support multi-byte ToUnicode entries, when using predefined CMaps (issue 16176)
Hopefully this makes sense, since we already "create" multi-byte ToUnicode entries in other cases (see e.g. the `getNormalizedUnicodes` table).
2023-03-21 21:35:57 +01:00
Calixte Denizet
2d0f30a67c Use the position of the previous xref stream if any when saving a pdf (bug 1823296) 2023-03-21 19:27:24 +01:00
Jonas Jenwald
50c844c5b8 Stop including isOffscreenCanvasSupported in the "StartRenderPage" message
With the previous commit this is now completely unused in API, hence it can be removed. This is done in a separate commit to make it easier to re-instate it, would the need ever arise.
2023-03-14 13:09:20 +01:00
Tim van der Meij
9819f1cc6b
Merge pull request #16108 from Snuffleupagus/delay-cleanup
Slightly delay cleanup, after rendering, in documents with large images
2023-03-11 15:52:12 +01:00
calixteman
b2a86350fc
Merge pull request #16096 from bungeman/fix_trig_functions
Correct PostScript trigonometric operators
2023-03-11 14:32:23 +01:00
Jonas Jenwald
c0671ac133 Slightly increase the maximum image sizes that we'll cache
The current value originated in PR 2317, and in the decade that have passed the amount of RAM available in (most) devices should have increased a fair bit.
Nowadays we also do a much better job of detecting repeated images at both the page- and document-level, which helps reduce overall memory-usage in many documents.

Finally the constant is also moved into the `src/shared/util.js` file, since it was implicitly used on both the main- and worker-thread previously.
2023-03-08 17:06:10 +01:00
Jonas Jenwald
6839f15a32
Merge pull request #16128 from Snuffleupagus/issue-16127
Support (rare) Type3 fonts with Pattern resources (issue 16127)
2023-03-08 12:21:53 +01:00
Jonas Jenwald
e5427ab11b
Merge pull request #16122 from Snuffleupagus/rm-onUnsupportedFeature
[api-minor] Remove the deprecated `onUnsupportedFeature` functionality (PR 15758 follow-up)
2023-03-08 12:16:27 +01:00
calixteman
cc555a389b
Merge pull request #16117 from calixteman/workaround_bug1820511
Avoid to have a factor too close to 2 when downscaling image
2023-03-08 11:12:56 +01:00
Calixte Denizet
1617ee6c3f Avoid to have a factor too close to 2 when downscaling image
It's a workaround for bug 1820511: it only affects Firefox on Windows
using the D2D backend.
2023-03-08 11:05:46 +01:00
Calixte Denizet
e9474f1c84 [api-minor] Add an option to set the max canvas area 2023-03-08 10:37:06 +01:00
Jonas Jenwald
471aef5fc6 Support (rare) Type3 fonts with Pattern resources (issue 16127)
This simply extends the approach in PR 10727 to also cover Patterns, which shouldn't be a common occurrence in Type3 fonts (since this is the first issue we've seen).
2023-03-08 09:20:52 +01:00
Calixte Denizet
b8dda089e2 Slightly modify the max width of a tracking space 2023-03-07 19:38:49 +01:00
Calixte Denizet
8db77cc361 Use appearance stream to render locked annotations (bug 1723568) 2023-03-07 15:01:31 +01:00
Jonas Jenwald
2f3dcc2327 [api-minor] Remove the deprecated onUnsupportedFeature functionality (PR 15758 follow-up)
This was deprecated in PR 15758, which has now been included in three official PDF.js releases.
While PR 15880 did limit the bundle-size impact of this functionality on e.g. the Firefox PDF Viewer, it still leads to some unnecessary "bloat" that these changes remove.
Furthermore, with this being deprecated there'd also be no effort put into e.g. extending the `UNSUPPORTED_FEATURES` list when handling future error cases.
2023-03-07 10:18:43 +01:00
Calixte Denizet
3849063d36 [Annotation] Don't rotate an annotation when it has the NoRotate flag 2023-03-06 17:27:11 +01:00
Calixte Denizet
05b0c9d7e6 Render large images even if they're larger than the canvas limits (bug 1720282)
The idea is to encode large image in BMP format (which is very simple and doesn't
require to compute any checksums) and then use createImageBitmap with a BMP blob
(which doesn't suffer of the Canvas/ImageData limits).
From a performance point of view, it isn't crazy (generating a large blob + decoding
it on the main thread is really not ideal) but at least we've something to display
which is a way better than a blank page (and one can notice that most of the time is
spent in decoding the image from the pdf stream).
2023-03-05 14:07:07 +01:00
Ben Wagner
158c836e26 Correct PostScript trigonometric operators
PDF 32000-1:2008 7.10.5.1 "Type 4 (PostScript Calculator) Functions"
defers to the PostScript Language Reference for the description of these
functions. The PostScript Language Reference, third edition chapter 8
"Operators" defines the `angle` type as a "number of degrees". Section
8.1 defines "angle `sin` real", "angle `cos` real", and "num den `atan`
angle". The documentation for `atan` further states that it will return
an angle in degrees between 0 and 360.

Handle these operators correctly in `PostScriptEvaluator.execute`.
Convert the inputs to `sin` and `cos` from degrees to radians for use
with `Math.sin` and `Math.cos`. Correctly pop two values from the stack
for `atan`, use `Math.atan2`, and convert from radians to (positive)
degrees.
2023-03-03 17:25:11 -05:00
Calixte Denizet
fd03cd5493 [api-minor] Generate images in the worker instead of the main thread.
We introduced the use of OffscreenCanvas in #14754 and this patch aims
to use them for all kind of images.
It'll slightly improve performances (and maybe slightly decrease memory use).
Since an image can be rendered in using some transfer maps but because of
OffscreenCanvas we don't have the underlying pixels array the transfer maps
stuff is re-implemented in using the SVG filter feComponentTransfer.
2023-03-01 17:40:12 +01:00
Jonas Jenwald
45c332110e
Check OffscreenCanvas support once on the worker-thread
Currently we repeat the `FeatureTest.isOffscreenCanvasSupported` checks all over the worker-thread code, and with upcoming changes this will become even "worse".

Hence this patch, which changes the *worker-thread* default value for the `isOffscreenCanvasSupported`-parameter to `false` and moves the feature-testing into the `BasePdfManager`-constructor.

*Please note:* This patch is written using the GitHub UI, since I'm currently without a dev machine, so hopefully it works correctly.
2023-02-27 12:27:28 +01:00
Calixte Denizet
3a21423386 [Acroform] Use the full path to find the node in the XFA datasets where to store the value
I noticed several 'Path not found' errors because of a field called #subform[2].
From the XFA specs, the hash is used for a class of elements in the template tree.
When we're looking for a node in the datasets tree, it doesn't make sense to search
for a class. Hence the path element starting with a hash are just skipped.
2023-02-23 12:09:39 +01:00
Tim van der Meij
22618213c7
Merge pull request #16040 from Snuffleupagus/arrayBuffersToBytes
Re-factor the `arraysToBytes` helper function (PR 16032 follow-up)
2023-02-12 11:47:57 +01:00
Jonas Jenwald
6d4d402a78 Move the arrayBuffersToBytes helper function into the worker-thread
Given that this helper function is only used on the worker-thread, there's no reason to duplicate it in both of the *built* `pdf.js` and `pdf.worker.js` files.
2023-02-11 21:34:37 +01:00
Jonas Jenwald
18042163ce Improve the consistency between the LocalPdfManager/NetworkPdfManager constructor
Currently these classes take a bunch of parameters (somewhat randomly ordered), probably because this is very old code that's been extended over the years.
Hence this patch changes the constructors to use parameter-objects instead, which improves consistency and (slightly) reduces the amount of code as well.

*Please note:* Also removes the `msgHandler`-property on these classes, since I cannot find a single call-site that accesses it.
2023-02-11 13:39:52 +01:00
Jonas Jenwald
14b0e8c0b6 Ensure that "GetAnnotations" errors are propagated to the main-thread (PR 15267 follow-up)
With the changes in PR 15267 we're now accidentally swallowing "GetAnnotations" errors, rather than propagating them to the main-thread as intended.
2023-02-10 12:18:35 +01:00