Commit Graph

2918 Commits

Author SHA1 Message Date
Jonas Jenwald
aeed6f2b67 Ignore named encoding for non-embedded symbol fonts (issue 16464)
The affected font is non-embedded ZapfDingbats, however the PDF document for some inexplicable reason specifies the encoding as "WinAnsiEncoding" (which is obviously wrong).
To work-around this bug in the PDF generator, we'll simply ignore any explicitly specified named encoding for non-embedded symbol fonts.
2023-05-24 10:48:47 +02:00
Jonas Jenwald
a6f9505a39
Merge pull request #16461 from Snuffleupagus/issue-16454
Improve "EI" detection in inline images (PR 12028 follow-up, issue 16454)
2023-05-23 22:23:22 +02:00
Calixte Denizet
a76a69e1ed Take into account the final space if any in the TJ command
The final space was just ignored and that led to wrongly position
the next chunk of text.
2023-05-23 17:09:32 +02:00
Jonas Jenwald
dfbbb8c0ac Improve "EI" detection in inline images (PR 12028 follow-up, issue 16454)
Given that inline images may contain "EI"-sequences in the image-data itself, actually finding the end-of-image operator isn't always straightforward.
Here we extend the implementation from PR 12028 to potentially check all of the following bytes, rather than stopping immediately. While we have fairly decent test-coverage for this code, whenever you're changing it there's unfortunately a slightly higher than normal risk of regressions. (You'd really wish that PDF generators just stop using inline images.)
2023-05-23 17:04:51 +02:00
Calixte Denizet
ca12bca276 Sanitize the glyph bounding box
- if the contours count is lower than -1, the glyph is really likely wrong
so just remove it from the font;
- if a contour has the repeat flag then repeats count mustn't be 0.
2023-05-21 16:24:41 +02:00
Jonas Jenwald
f657de7de2 Extend getNonStdFontMap for non-embedded Impact fonts (bug 1365930)
According to https://en.wikipedia.org/wiki/Impact_(typeface) this font should be available on all current versions of Windows, and with the recently added font-substitution we should actually be able to render it correctly (at least on Windows).
2023-05-19 18:40:03 +02:00
Jonas Jenwald
8c4821ceda [api-minor] Slightly shorten the marked-content ids used in the textLayer
Generally we try to keep the ids that we create short, hence we can slightly shorten the "static" parts of them.
2023-05-18 22:32:10 +02:00
Jonas Jenwald
04de155aaa Slightly shorten the loadedName-ids used with font-substitutions
Generally we try to keep the ids that we create short, hence we can slightly shorten the "static" part of them.
2023-05-18 22:27:11 +02:00
Jonas Jenwald
3be66f59d6
Merge pull request #16440 from Snuffleupagus/more-modern-JS
Introduce even more modern JavaScript features in the code-base
2023-05-18 20:56:00 +02:00
Calixte Denizet
3091e70aad Flush the current chunk when the font changed because of a restore op (issue #14755) 2023-05-18 19:37:16 +02:00
Jonas Jenwald
e8030752f3 Introduce even more modern JavaScript features in the code-base
After PR 12563 we're now free to use e.g. logical OR assignment, nullish coalescing, and optional chaining in the entire code-base.
2023-05-18 18:55:41 +02:00
Jonas Jenwald
4355e76c60 Simplify the fontID handling in PartialEvaluator.loadFont
The `fontID` handling is quite old and predates the use of the `idFactory` to generate a unique id for each font, hence we can simplify this code a little bit.
2023-05-18 13:09:08 +02:00
Tim van der Meij
ac8032628b
Merge pull request #16424 from Snuffleupagus/core-optional-chaining
Introduce more optional chaining in the `src/core/` folder
2023-05-18 12:40:08 +02:00
calixteman
839be801a0
Merge pull request #16433 from calixteman/bug1825002
For text widgets, get the text from the AP stream instead of from the format callback (bug 1825002)
2023-05-17 16:48:59 +02:00
Calixte Denizet
177036e6ae For text widgets, get the text from the AP stream instead of from the format callback (bug 1825002)
When fixing bug 1766987, I thought the field formatted value came from
the result of the format callback: I was wrong. The format callback is ran
but the value is unused (maybe it's useful to set some global vars... or
it's just a bug in Acrobat). Anyway the value to display is the one rendered
in the AP stream.
The field value setter has been simplified and that fixes issue #16409.
2023-05-17 14:07:28 +02:00
Jonas Jenwald
bfb374dbf6 Attempt to fallback to a default font, for non-available ones, in more cases (issue 16432)
This essentially extends PR 11218 to also apply when looking up the final font-reference, via the XRef-table, fails because the font isn't available.

This patch also changes `PartialEvaluator.fallbackFontDict` to simply use "Helvetica" as the default font-name, since that seems generally reasonable given the now existing font-substitution code.
2023-05-17 11:41:08 +02:00
Calixte Denizet
385f275ad9 Warn when pdf.js can't load an OS font 2023-05-16 14:58:38 +02:00
Calixte Denizet
4e8dd54e8e For non-embedded fonts, don't generate the fallback several times 2023-05-15 20:02:45 +02:00
Calixte Denizet
b264e0301a Simplify the code to generate font substitution information 2023-05-15 19:17:52 +02:00
Jonas Jenwald
1b4a7c5965 Introduce more optional chaining in the src/core/ folder
After PR 12563 we're now free to use optional chaining in the worker-thread as well. (This patch also fixes one previously "missed" case in the `web/` folder.)

For the MOZCENTRAL build-target this patch reduces the total bundle-size by `1.6` kilobytes.
2023-05-15 12:38:28 +02:00
Calixte Denizet
d4b70ec306 For missing font, use a local font if it exists even if there's no standard substitution
If the font foo is missing we just try lo load local(foo) and maybe
we'll be lucky.
2023-05-13 21:54:27 +02:00
Calixte Denizet
cfb908c999 Add a cache to avoid to load several times a local font
On my computer, it takes few tenths of a second to load a local font.
Since a font can be used several times in a document, the cache will
improve performances.
2023-05-10 20:01:21 +02:00
calixteman
2d2f7b315e
Merge pull request #16363 from calixteman/use_local_font
[api-minor] Use a local font or fallback on an embedded one (if it exists) for non-embedded fonts (bug 1766039)
2023-05-10 14:19:05 +02:00
Calixte Denizet
53134c0c0b [api-minor] Use a local font or fallback on an embedded one (if it exists) for non-embedded fonts (bug 1766039)
- Replace FoxitSans with LiberationSans: LiberationSans is already there (for XFA) and we can use
it as a good replacement of FoxitSans.
- For now we just try to substitue standard fonts, the strategy is the following:
  * we try to find a font locally from a hardcoded list;
  * if it fails then we use Liberation as fallback (only for Helvetica for the moment);
  * else we just fallback on the system serif/sansserif/monospace font.
2023-05-10 14:10:23 +02:00
Calixte Denizet
2486536843 Compress the data when saving annotions
CompressionStream API has been added in Firefox 113
(see https://bugzilla.mozilla.org/show_bug.cgi?id=1823619)
hence we can use it to compress the streams with added/modified
annotations.
2023-05-09 14:46:50 +02:00
calixteman
8f2d8f62f3
Merge pull request #16397 from calixteman/issue14565
Make something similar to Acrobat when Underline annotation has no appearance
2023-05-08 21:16:49 +02:00
Tim van der Meij
bfb664b9a1
Merge pull request #16398 from Snuffleupagus/xfa-optional-chaining
Introduce some optional chaining in the `src/core/xfa/` folder
2023-05-07 14:54:05 +02:00
Jonas Jenwald
1753e321cd Remove the compatibility checks in WorkerMessageHandler.createDocumentHandler
For some time these checks have only targeted Node.js environments, since the features in question exist in all supported browsers (even when a `legacy`-build is used).

Now that we've updated the minimum supported Node.js version to 18, a number of polyfills are thus (finally) no longer necessary in that environment. Hence for certain *basic* functionality, such as e.g. text-extraction, it's now possible to use either a modern- or a `legacy`-build of the PDF.js library in Node.js environments.

*Please note:* For e.g. canvas-rendering in Node.js environments it's still necessary to use a `legacy`-build, since that functionality requires various polyfills.
2023-05-07 13:43:19 +02:00
Jonas Jenwald
ed8be6f882 [api-minor] Update the minimum supported Node.js version to 18
This patch updates the minimum supported environments as follows:
 - Node.js 18, which was released on 2022-04-19; see https://en.wikipedia.org/wiki/Node.js#Releases

Note also that Node.js 16 will soon reach EOL, and thus no longer receive any security updates.
2023-05-07 13:43:19 +02:00
Jonas Jenwald
89f768322d Introduce some optional chaining in the src/core/xfa/ folder
After PR 12563 we're now free to use optional chaining in the worker-thread as well.
2023-05-07 12:49:07 +02:00
Calixte Denizet
6c0fdc6ec2 Make something similar to Acrobat when Underline annotation has no appearance 2023-05-06 21:19:25 +02:00
Jonas Jenwald
722e5910e1 Improve handling of JPEG images with non-standard /Decode-entries (issue 16395)
The /Decode-implementation in the our JPEG decoder, i.e. `src/core/jpg.js`, seems to only handle *inverting* of images properly. To support arbitrary /Decode-entries correctly we'll always use the `PDFImage.decodeBuffer` method, even for "simple" JPEG images, which should be fine since non-default /Decode-entries aren't a very common occurrence.

*Please note:* This patch will lead to a little bit of movement in some existing test-cases, however it should be virtually imperceivable to the naked eye.
2023-05-06 13:55:39 +02:00
calixteman
f151a39d14
Merge pull request #16387 from calixteman/issue16384
[Annotations] Draw readonly annotations on their own canvas and show the HTML elements when there is a JS interaction (issue #16384)
2023-05-04 21:49:08 +02:00
Calixte Denizet
72da14f005 [Annotations] Draw readonly annotations on their own canvas and show the HTML elements when there is a JS interaction (issue #16384) 2023-05-04 20:08:32 +02:00
calixteman
a24e11a91c
Merge pull request #16106 from bungeman/improve_color_stop_detection
Better approximate gradient color stops
2023-05-04 19:48:57 +02:00
Jonas Jenwald
667085ee33
Merge pull request #16368 from Snuffleupagus/rm-GlobalImageCache-addPageIndex
Inline the `addPageIndex` method in `GlobalImageCache.shouldCache`
2023-05-04 12:09:04 +02:00
Jonas Jenwald
001acfb5ac
Merge pull request #16381 from Snuffleupagus/rm-isStandardFont-prop
Remove the unused `isStandardFont` font-property (PR 15880 follow-up)
2023-05-04 00:30:05 +02:00
Jonas Jenwald
24a75bda5d Remove the unused isStandardFont font-property (PR 15880 follow-up)
This property was added in PR 12726 specifically for use in the `getFontType` function, indirectly used by the `PDFDocumentProxy.stats` getter in the API.
In PR 15880 that functionality was removed, but I forgot to remove this now unused font-property.
2023-05-03 11:52:54 +02:00
Jonas Jenwald
88616f77ae Remove the closure from BitModel in the src/core/jpx.js file 2023-04-29 13:49:39 +02:00
Jonas Jenwald
b0a1af306d Simplify initialization of static class properties in the worker-thread
Now that we no longer depend on the old Babel version in SystemJS we can remove the `static get ...` work-arounds used to define constants, which leads to slightly more compact code.
2023-04-29 13:49:38 +02:00
Jonas Jenwald
d950b91c4e Introduce some logical assignment in the src/core/ folder 2023-04-29 13:49:37 +02:00
Jonas Jenwald
317abd6d07 Change the createPromiseCapability helper function into a PromiseCapability class
This is not only slightly more compact, but it also simplifies the handling of the `settled` getter.
2023-04-29 13:43:24 +02:00
Jonas Jenwald
94c2d08975 Revert "Add a getArrayLookupTableFactory helper function and use it to re-format src/core/{glyphlist, unicode}.js"
This reverts commit 56fa6d414c now that SystemJS is gone.
2023-04-29 13:43:24 +02:00
Jonas Jenwald
95bf9fc17f Remove SystemJS usage, in development mode, from the worker
Now that https://bugzilla.mozilla.org/show_bug.cgi?id=1247687 has landed in Firefox, we're able to use worker-modules during development :-)

This removes the final piece of SystemJS usage from the PDF.js library, thus allowing a fair bit of clean-up, and we now use *only* native `import`/`export` statements everywhere in development mode.
2023-04-29 13:43:24 +02:00
Jonas Jenwald
bb1228cb64 Inline the addPageIndex method in GlobalImageCache.shouldCache
When the `GlobalImageCache` implementation originally landed, back in PR 11912, the image handling was slightly more complex (with e.g. browser-decoding of some JPEG images). At this point it no longer seems necessary to manually handle pageIndexes in this way, and we should be able to simply inline that in the `GlobalImageCache.shouldCache` method.
2023-04-28 09:40:32 +02:00
Jonas Jenwald
e12535457f Avoid some repeated stringToBytes-calls in the src/core/crypto.js file
Currently we repeatedly lookup, and convert to bytes, the "O" and "U" encryption-dictionary entries.
2023-04-26 17:52:46 +02:00
Jonas Jenwald
74585c7c59 Remove the unused PDF20.hash method
This method was added in PR 4938, almost nine years ago, however it doesn't appear to ever have been used.
Given the similarities between the `PDF17` and `PDF20` classes, and how they're used, if the `PDF20.hash` method was actually necessary you'd also expect a similiar method in the `PDF17` class.
2023-04-23 10:13:46 +02:00
Jonas Jenwald
5e0722e4c2 Remove the PDF20 closure, in the src/core/crypto.js file
To allow doing this the existing helper function was changed into a "private" method instead.
2023-04-23 10:08:17 +02:00
Jonas Jenwald
9cb3236ac0 Remove the remaining unnecessary closures in the src/core/primitives.js file 2023-04-22 15:33:04 +02:00
Tim van der Meij
e304423ba1
Merge pull request #16331 from Snuffleupagus/cmap-rm-closure
Remove unnecessary closures in the CMap code
2023-04-22 14:58:13 +02:00
Tim van der Meij
c9359957e6
Merge pull request #16305 from Snuffleupagus/PDFJSDev-skip-PRODUCTION
Remove the `PRODUCTION` build-target
2023-04-22 14:53:30 +02:00
Jonas Jenwald
bc7aa8a585 Re-factor some String.fromCharCode usage in the src/core/binary_cmap.js file
We can replace one case of `apply` with rest parameters, and avoid doing repeated `String.fromCharCode` calls within a loop.
2023-04-21 12:21:31 +02:00
Jonas Jenwald
cabc98f310 Remove the remaining closure in the src/core/cmap.js file
With modern JavaScript we (usually) no longer need to keep old closures, which slightly reduces the size of the code.
2023-04-21 12:21:31 +02:00
Jonas Jenwald
244002502b Move the BinaryCMapReader into its own file
The "binary" CMap-format is specific to the PDF.js library, and is used to reduce the size of the built-in CMap data-files.
By moving this code to its own file we can remove the nowadays unnecessary closures, which helps to slightly reduce the size of this code.
2023-04-21 12:21:20 +02:00
Calixte Denizet
19ca41896e Correctly clip the text in the text layer (fixes #16316) 2023-04-18 17:00:42 +02:00
Calixte Denizet
117bbf7cd9 [api-minor] Don't normalize the text used in the text layer.
Some arabic chars like \ufe94 could be searched in a pdf, hence it must be normalized
when creating the search query. So to avoid to duplicate the normalization code,
everything is moved in the find controller.
The previous code to normalize text was using NFKC but with a hardcoded map, hence it
has been replaced by the use of normalize("NFKC") (it helps to reduce the bundle size
by 30kb).
In playing with this \ufe94 char, I noticed that the bidi algorithm wasn't taking into
account some RTL unicode ranges, the generated font wasn't embedding the mapping this
char and the unicode ranges in the OS/2 table weren't up-to-date.

When normalized some chars can be replaced by several ones and it induced to have
some extra chars in the text layer. To avoid any regression, when copying some text
from the text layer, a copied string is normalized (NFKC) before being put in the
clipboard (it works like this in either Acrobat or Chrome).
2023-04-17 14:31:23 +02:00
Jonas Jenwald
804aa896a7 Stop using the PRODUCTION build-target in the JavaScript code
This *special* build-target is very old, and was introduced with the first pre-processor that only uses comments to enable/disable code.
When the new pre-processor was added `PRODUCTION` effectively became redundant, at least in JavaScript code, since `typeof PDFJSDev === "undefined"` checks now do the same thing.

This patch proposes that we remove `PRODUCTION` from the JavaScript code, since that simplifies the conditions and thus improves readability in many cases.
*Please note:* There's not, nor has there ever been, any gulp-task that set `PRODUCTION = false` during building.
2023-04-17 12:04:34 +02:00
Jonas Jenwald
c79bdd6ae6 Simplify the CFFCompiler.compileTypedArray method
Rather than manually creating the Array, we can use the now existing `Array.from` method instead.
2023-04-15 11:13:34 +02:00
Jonas Jenwald
0ce568e789 Remove CFFCompiler.compileGlobalSubrIndex since it's completely unused
This method was originally added in PR 1320, eleven years ago, however it doesn't appear to ever have been used (not even from the start).
Furthermore, this method also tries to access a property that doesn't exist (`this.out`) and then call a method that also doesn't exist (`writeByteArray`).
2023-04-15 11:13:21 +02:00
Jonas Jenwald
ab2773416b
Merge pull request #16291 from Snuffleupagus/issue-16289
Limit the `Path2D`-checks in the worker-thread to Node.js (PR 16238 follow-up, issue 16289)
2023-04-14 21:26:12 +02:00
Calixte Denizet
5eab8ec610 Avoid when it's possible to use Array.concat when compiling a CFF font
In looking at https://bugs.ghostscript.com/show_bug.cgi?id=706451 I noticed that bug2.pdf was pretty
slow to load for such a basic file.
In profiling I noticed that a lot of time is spent in Array.concat, hence this patch use Array.push when
it's possible (it's now ~3 times faster).
2023-04-14 19:01:01 +02:00
Jonas Jenwald
edd13895dd Limit the Path2D-checks in the worker-thread to Node.js (PR 16238 follow-up, issue 16289)
The changes in PR 16238 were intended specifically for Node.js environments, however they accidentally applied to older browsers as well.

*Please note:* In up-to-date browsers `Path2D` is available in Workers, which should be connected to the introduction of `OffscreenCanvas`.
2023-04-14 11:51:11 +02:00
Jonas Jenwald
3a36a9d337
Merge pull request #16268 from Snuffleupagus/RegionalImageCache
Attempt to also cache images at the "page"-level (issue 16263)
2023-04-11 12:06:29 +02:00
calixteman
c1c372c320
Merge pull request #16225 from calixteman/16224
Thin whitespaces must have their own span
2023-04-11 11:13:16 +02:00
Jonas Jenwald
9881dbf927 Attempt to also cache images at the "page"-level (issue 16263)
Currently we have two separate image-caches on the worker-thread:
 - A local one, which is unique to each `PartialEvaluator.getOperatorList` invocation. This one caches both names *and* references, since image-resources may be accessed in either way.
 - A global one, which applies to the entire PDF documents and all its pages. This one only caches references, since nothing else would work.

This patch introduces a third image-cache, which essentially sits "between" the two existing ones. The new `RegionalImageCache`[1] will be usable throughout a `PartialEvaluator` instance, and consequently it *only* caches references, which thus allows us to keep track of repeated image-resources found in e.g. different /Form and /SMask objects.

---
[1] For lack of a better word, since naming things is hard...
2023-04-10 11:34:41 +02:00
Tim van der Meij
13f2426aab
Merge pull request #16238 from Snuffleupagus/update-Node-compat-check
Update the Node.js compatibility-check in the worker-thread
2023-04-01 14:20:33 +02:00
Jonas Jenwald
57a307d0cd Update the Node.js compatibility-check in the worker-thread
*Please note:* In Node.js environments a `legacy`-build must be used since only those versions include any polyfills.

Previously we'd only check if `ReadableStream` is natively supported, however since Node.js version 18 that's now been implemented; please see https://developer.mozilla.org/en-US/docs/Web/API/ReadableStream#browser_compatibility
Hence we'll also check for the availability of `Path2D`, since that's browser-specific functionality not expected to be available in Node.js environments; please see https://developer.mozilla.org/en-US/docs/Web/API/Path2D#browser_compatibility
2023-03-30 18:36:15 +02:00
Jonas Jenwald
5063a6f2a9 [api-minor] Remove the disableCombineTextItems option
*Please note:* This parameter has never been used within the PDF.js library/viewer itself, and it was only ever added for backwards compatibility reasons.

This parameter was added in PR 7475, over six years ago, to try and optionally maintain the previous *default* text-extraction behaviour.
However as part of the general text-extraction improvements in PR 13257, almost two years ago, the `disableCombineTextItems` functionality was accidentally "broken" in various ways. Note how the only (very basic) unit-test was updated in a way that doesn't really make sense, since generally speaking you'd expect that using the option should result in *more* (or at least the same number of) text-items. Furthermore there's also the recent issue 16209, where the option causes almost all textContent to be concatenated together.

Hence this patch proposes that we simply remove the `disableCombineTextItems` option since it's essentially unused/untested functionality, as evident from the fact that it took almost two years for someone to notice that it's broken.
2023-03-30 14:23:38 +02:00
Calixte Denizet
4b7eb1436d Thin whitespaces must have their own span 2023-03-29 11:23:58 +02:00
calixteman
622465dc20
Merge pull request #16223 from calixteman/16221
Create a new chunk when the char is too rised compared to the previous one
2023-03-28 15:30:14 +02:00
Calixte Denizet
a96f10e55d Create a new chunk when the char is too rised compared to the previouse one 2023-03-28 13:56:46 +02:00
Jonas Jenwald
d584513cb2
Merge pull request #16213 from Snuffleupagus/validateCSSFont-quotes
Reduce duplication in the `validateCSSFont` helper function
2023-03-28 12:40:23 +02:00
Jonas Jenwald
20cbb89412 Simplify the isPDFFunction helper function
Originally we used helper functions for checking if something was a Dictionary or Stream, and then having an initial `typeof` check probably made sense.
However, given that we're using `instanceof` nowadays the additional check longer seems necessary.
2023-03-27 11:34:20 +02:00
Jonas Jenwald
ef70988027 Reduce duplication in the validateCSSFont helper function
Currently we're *virtually* duplicating the same code, for validating quotation marks, twice in this helper function.

The size decrease is quite small (107 bytes) and this makes the code slightly harder to reader, hence I completely understand if this patch is rejected.
2023-03-26 12:12:49 +02:00
Jonas Jenwald
035a273d30 Use replaceAll in the recoverJsURL helper function
We can just do direct replacement when building the regular expression, rather than splitting the string into an Array and then re-joining it.
2023-03-25 12:31:39 +01:00
Jonas Jenwald
96e34fbb7d Enable the unicorn/prefer-negative-index ESLint plugin rule
Please see https://github.com/sindresorhus/eslint-plugin-unicorn/blob/main/docs/rules/prefer-negative-index.md
2023-03-24 10:18:32 +01:00
Jonas Jenwald
1fc09f0235 Enable the unicorn/prefer-string-replace-all ESLint plugin rule
Note that the `replaceAll` method still requires that a *global* regular expression is used, however by using this method it's immediately obvious when looking at the code that all occurrences will be replaced; please see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/replaceAll#parameters

Please find additional details at https://github.com/sindresorhus/eslint-plugin-unicorn/blob/main/docs/rules/prefer-string-replace-all.md
2023-03-23 12:57:10 +01:00
Jonas Jenwald
5f64621d46 Use String.prototype.replaceAll() where appropriate
This fairly new method allows replacing *multiple* occurrences within a string without having to use regular expressions.

Please refer to:
 - https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/replaceAll
 - https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/replaceAll#browser_compatibility
2023-03-22 15:31:10 +01:00
Jonas Jenwald
137a2d6e30 Add even more non-standard ligatures (PR 15517 follow-up)
Given that we already create multi-byte ToUnicode entries in other cases, see e.g. the `getNormalizedUnicodes` table, this is hopefully fine.
2023-03-22 10:42:52 +01:00
Jonas Jenwald
122d5e549a Track previous "XRefStm"s in a Set, rather than an Object
Having just reviewed a patch touching this code, I couldn't help noticing that an `Object` isn't really the optimal data-structure for this and nowadays we can do better by using a `Set` instead.
2023-03-22 09:41:19 +01:00
Jonas Jenwald
9321758d91
Merge pull request #16186 from Snuffleupagus/issue-16176
Support multi-byte ToUnicode entries, when using predefined CMaps (issue 16176)
2023-03-21 22:17:18 +01:00
Jonas Jenwald
d4bcfe8c16 Support multi-byte ToUnicode entries, when using predefined CMaps (issue 16176)
Hopefully this makes sense, since we already "create" multi-byte ToUnicode entries in other cases (see e.g. the `getNormalizedUnicodes` table).
2023-03-21 21:35:57 +01:00
Calixte Denizet
2d0f30a67c Use the position of the previous xref stream if any when saving a pdf (bug 1823296) 2023-03-21 19:27:24 +01:00
Jonas Jenwald
50c844c5b8 Stop including isOffscreenCanvasSupported in the "StartRenderPage" message
With the previous commit this is now completely unused in API, hence it can be removed. This is done in a separate commit to make it easier to re-instate it, would the need ever arise.
2023-03-14 13:09:20 +01:00
Tim van der Meij
9819f1cc6b
Merge pull request #16108 from Snuffleupagus/delay-cleanup
Slightly delay cleanup, after rendering, in documents with large images
2023-03-11 15:52:12 +01:00
calixteman
b2a86350fc
Merge pull request #16096 from bungeman/fix_trig_functions
Correct PostScript trigonometric operators
2023-03-11 14:32:23 +01:00
Ben Wagner
5fad91a680 Better approximate gradient color stops
PDF gradients do not have color stops but an arbitrary PDF function of
the type f(t) -> color. CSS gradients are only based on color stops.
Most PDF gradient functions are produced from color stop oriented
gradients.

Take advantage of this by sampling the PDF function at a higher
frequency but not converting any samples which could be interpolated to
color stops. The sampling frequency is chosen to be the least common
multiple of as many values as practical to exactly re-create the common
case of the PDF function implementing equally spaced linearly
interpolated stops in RGB color space. This also allows for better
approximation of other smooth PDF functions (non-linear, or non-equally
spaced, or in different color space).

Fixes: #10572, #14165
2023-03-09 08:49:50 -05:00
Jonas Jenwald
c0671ac133 Slightly increase the maximum image sizes that we'll cache
The current value originated in PR 2317, and in the decade that have passed the amount of RAM available in (most) devices should have increased a fair bit.
Nowadays we also do a much better job of detecting repeated images at both the page- and document-level, which helps reduce overall memory-usage in many documents.

Finally the constant is also moved into the `src/shared/util.js` file, since it was implicitly used on both the main- and worker-thread previously.
2023-03-08 17:06:10 +01:00
Jonas Jenwald
6839f15a32
Merge pull request #16128 from Snuffleupagus/issue-16127
Support (rare) Type3 fonts with Pattern resources (issue 16127)
2023-03-08 12:21:53 +01:00
Jonas Jenwald
e5427ab11b
Merge pull request #16122 from Snuffleupagus/rm-onUnsupportedFeature
[api-minor] Remove the deprecated `onUnsupportedFeature` functionality (PR 15758 follow-up)
2023-03-08 12:16:27 +01:00
calixteman
cc555a389b
Merge pull request #16117 from calixteman/workaround_bug1820511
Avoid to have a factor too close to 2 when downscaling image
2023-03-08 11:12:56 +01:00
Calixte Denizet
1617ee6c3f Avoid to have a factor too close to 2 when downscaling image
It's a workaround for bug 1820511: it only affects Firefox on Windows
using the D2D backend.
2023-03-08 11:05:46 +01:00
Calixte Denizet
e9474f1c84 [api-minor] Add an option to set the max canvas area 2023-03-08 10:37:06 +01:00
Jonas Jenwald
471aef5fc6 Support (rare) Type3 fonts with Pattern resources (issue 16127)
This simply extends the approach in PR 10727 to also cover Patterns, which shouldn't be a common occurrence in Type3 fonts (since this is the first issue we've seen).
2023-03-08 09:20:52 +01:00
Calixte Denizet
b8dda089e2 Slightly modify the max width of a tracking space 2023-03-07 19:38:49 +01:00
Calixte Denizet
8db77cc361 Use appearance stream to render locked annotations (bug 1723568) 2023-03-07 15:01:31 +01:00
Jonas Jenwald
2f3dcc2327 [api-minor] Remove the deprecated onUnsupportedFeature functionality (PR 15758 follow-up)
This was deprecated in PR 15758, which has now been included in three official PDF.js releases.
While PR 15880 did limit the bundle-size impact of this functionality on e.g. the Firefox PDF Viewer, it still leads to some unnecessary "bloat" that these changes remove.
Furthermore, with this being deprecated there'd also be no effort put into e.g. extending the `UNSUPPORTED_FEATURES` list when handling future error cases.
2023-03-07 10:18:43 +01:00
Calixte Denizet
3849063d36 [Annotation] Don't rotate an annotation when it has the NoRotate flag 2023-03-06 17:27:11 +01:00
Calixte Denizet
05b0c9d7e6 Render large images even if they're larger than the canvas limits (bug 1720282)
The idea is to encode large image in BMP format (which is very simple and doesn't
require to compute any checksums) and then use createImageBitmap with a BMP blob
(which doesn't suffer of the Canvas/ImageData limits).
From a performance point of view, it isn't crazy (generating a large blob + decoding
it on the main thread is really not ideal) but at least we've something to display
which is a way better than a blank page (and one can notice that most of the time is
spent in decoding the image from the pdf stream).
2023-03-05 14:07:07 +01:00
Ben Wagner
158c836e26 Correct PostScript trigonometric operators
PDF 32000-1:2008 7.10.5.1 "Type 4 (PostScript Calculator) Functions"
defers to the PostScript Language Reference for the description of these
functions. The PostScript Language Reference, third edition chapter 8
"Operators" defines the `angle` type as a "number of degrees". Section
8.1 defines "angle `sin` real", "angle `cos` real", and "num den `atan`
angle". The documentation for `atan` further states that it will return
an angle in degrees between 0 and 360.

Handle these operators correctly in `PostScriptEvaluator.execute`.
Convert the inputs to `sin` and `cos` from degrees to radians for use
with `Math.sin` and `Math.cos`. Correctly pop two values from the stack
for `atan`, use `Math.atan2`, and convert from radians to (positive)
degrees.
2023-03-03 17:25:11 -05:00
Calixte Denizet
fd03cd5493 [api-minor] Generate images in the worker instead of the main thread.
We introduced the use of OffscreenCanvas in #14754 and this patch aims
to use them for all kind of images.
It'll slightly improve performances (and maybe slightly decrease memory use).
Since an image can be rendered in using some transfer maps but because of
OffscreenCanvas we don't have the underlying pixels array the transfer maps
stuff is re-implemented in using the SVG filter feComponentTransfer.
2023-03-01 17:40:12 +01:00
Jonas Jenwald
45c332110e
Check OffscreenCanvas support once on the worker-thread
Currently we repeat the `FeatureTest.isOffscreenCanvasSupported` checks all over the worker-thread code, and with upcoming changes this will become even "worse".

Hence this patch, which changes the *worker-thread* default value for the `isOffscreenCanvasSupported`-parameter to `false` and moves the feature-testing into the `BasePdfManager`-constructor.

*Please note:* This patch is written using the GitHub UI, since I'm currently without a dev machine, so hopefully it works correctly.
2023-02-27 12:27:28 +01:00
Calixte Denizet
3a21423386 [Acroform] Use the full path to find the node in the XFA datasets where to store the value
I noticed several 'Path not found' errors because of a field called #subform[2].
From the XFA specs, the hash is used for a class of elements in the template tree.
When we're looking for a node in the datasets tree, it doesn't make sense to search
for a class. Hence the path element starting with a hash are just skipped.
2023-02-23 12:09:39 +01:00
Tim van der Meij
22618213c7
Merge pull request #16040 from Snuffleupagus/arrayBuffersToBytes
Re-factor the `arraysToBytes` helper function (PR 16032 follow-up)
2023-02-12 11:47:57 +01:00
Jonas Jenwald
6d4d402a78 Move the arrayBuffersToBytes helper function into the worker-thread
Given that this helper function is only used on the worker-thread, there's no reason to duplicate it in both of the *built* `pdf.js` and `pdf.worker.js` files.
2023-02-11 21:34:37 +01:00
Jonas Jenwald
18042163ce Improve the consistency between the LocalPdfManager/NetworkPdfManager constructor
Currently these classes take a bunch of parameters (somewhat randomly ordered), probably because this is very old code that's been extended over the years.
Hence this patch changes the constructors to use parameter-objects instead, which improves consistency and (slightly) reduces the amount of code as well.

*Please note:* Also removes the `msgHandler`-property on these classes, since I cannot find a single call-site that accesses it.
2023-02-11 13:39:52 +01:00
Jonas Jenwald
14b0e8c0b6 Ensure that "GetAnnotations" errors are propagated to the main-thread (PR 15267 follow-up)
With the changes in PR 15267 we're now accidentally swallowing "GetAnnotations" errors, rather than propagating them to the main-thread as intended.
2023-02-10 12:18:35 +01:00
Jonas Jenwald
c56f25409d Re-factor the arraysToBytes helper function (PR 16032 follow-up)
Currently this helper function only has two call-sites, and both of them only pass in `ArrayBuffer` data. Given how it's implemented there's a couple of code-paths that are completely unused (e.g. the "string" one), and in particular the intended fast-paths don't actually work.
This patch re-factors and simplifies the helper function, and it'll no longer accept anything except `ArrayBuffer` data (hence why it's also re-named).

Note that at the time when `arraysToBytes` was added we still supported browsers without TypedArray functionality, and we'd then simulate them using regular Arrays.
2023-02-10 10:26:35 +01:00
Jonas Jenwald
5ba596786c Change WorkerTasks, in WorkerMessageHandler.createDocumentHandler, to a use a Set
This is a tiny bit more compact, thanks to the `Set.prototype.delete` method.
2023-02-09 22:01:16 +01:00
calixteman
0fca6e187c
Merge pull request #16035 from calixteman/fix_combo_value
[Annotation] A combo can have a value other than one in the options
2023-02-09 19:56:16 +01:00
Jonas Jenwald
1fc8350795
Merge pull request #16032 from Snuffleupagus/less-arrayByteLength
Reduce usage of the `arrayByteLength` helper function
2023-02-09 18:56:20 +01:00
Calixte Denizet
cb1638530d [Annotation] A combo can have a value other than one in the options
When printing the pdf in #12233 in Acrobat, we can see that the combo for country
is empty: it's because the V entry doesn't have to be one of the options.
2023-02-09 18:50:57 +01:00
calixteman
972744a68f
Merge pull request #16033 from calixteman/bug1640217
Ignore position of combining diacritics when getting text (bug 1640217)
2023-02-09 18:23:59 +01:00
Calixte Denizet
58e4d92884 [Annotation] For choice widget, use the I entry instead of the V one (bug 1770750)
It isn't really conform to the specifications but Acrobat is working like that...
2023-02-09 17:26:13 +01:00
Calixte Denizet
4e9f26afa3 Ignore position of combining diacritics when getting text (bug 1640217) 2023-02-09 17:13:57 +01:00
Jonas Jenwald
96d338e437 Reduce usage of the arrayByteLength helper function
We're using this helper function when reading data from the [`PDFWorkerStreamReader.read`](a49d1d1615/src/core/worker_stream.js (L90-L98)) and [`PDFWorkerStreamRangeReader.read`](a49d1d1615/src/core/worker_stream.js (L122-L128)) methods, and as can be seen they always return `ArrayBuffer` data. Hence we can simply get the `byteLength` directly, and don't need to use the helper function.

Note that at the time when `arrayByteLength` was added we still supported browsers without TypedArray functionality, and we'd then simulate them using regular Arrays.
2023-02-09 15:50:38 +01:00
Jonas Jenwald
323d3d246a Re-factor the readChunk function in ChunkedStreamManager.sendRequest
Move the `done` branch to the top of the function, similar to how we usually format things when `ReadableStream`s are used.
2023-02-09 15:33:06 +01:00
Calixte Denizet
a25895bf72 [Annotation] Take into account the stroke alpha for a FreeText without appearance 2023-02-07 22:15:27 +01:00
Calixte Denizet
ea7b4b4d6c [Annotation] Avoid to encrypt the appearance stream two times (bug 1815476) 2023-02-07 19:26:46 +01:00
Jonas Jenwald
3a7fce49a3 A tiny improvement of the MetadataParser._repair method
We can just insert the initial greater-than sign at the start of the buffer, rather than doing that manually at the end.
2023-02-04 12:43:55 +01:00
Jonas Jenwald
808ca828f1 Extend getGlyphMapForStandardFonts with additional entries (issue 15977) 2023-01-30 12:13:21 +01:00
Jonas Jenwald
40a46e4397 Tweak adjustType1ToUnicode for fonts with a predefined *named* encoding (bug 1811668, PR 14050 follow-up)
*Please note:* I cannot reproduce the problem reported in bug 1811668, regarding the context menu, and in any case it's not clear that that part is even a PDF Viewer bug.

Looking at bug 1811668 I couldn't help but noticing that the textLayer isn't correct, and it's unfortunately once again a problem with the `adjustType1ToUnicode` function. That's intended to help improve text-selection for fonts without a /ToUnicode-entry, and in many cases it does help (the original PR fixed lots of issues) however it's also caused some problems.

In order to improve text-selection in bug 1811668, we'll now properly ignore fonts that have a predefined *named* encoding specified since that's really the intention with PR 14050.
2023-01-21 12:21:21 +01:00
Jonas Jenwald
f2fce93826 [JBIG2] Ensure that the decodeInteger function returns valid integers (issue 15942)
The JBIG2 images in this PDF document are corrupt enough that even Adobe Reader warns about it when opening the file.
*Please note:* I don't really know the JBIG2 image format at all, however from a very brief look at the specification it seems that integers should be 32-bit.
2023-01-19 17:14:17 +01:00
Jonas Jenwald
d6be5141e9 Fallback to using the name table to infer the encoding for TrueType fonts missing such data (issue 15910)
The relevant TrueType font is missing both /ToUnicode *and* /Encoding entires, either of which would have prevented the (current) broken textLayer rendering.
My first idea was that we could use the `post` table in the TrueType font, see https://developer.apple.com/fonts/TrueType-Reference-Manual/RM06/Chap6post.html, to get the actual glyphNames and amend the fallback ToUnicode-map that way. Unfortunately that didn't work, since the `post` table only contained ".notdef" and "" (i.e. empty string) entries.

Instead we try to use the `name` table in the TrueType font, see https://developer.apple.com/fonts/TrueType-Reference-Manual/RM06/Chap6name.html, to determine if the platform is Windows and thus fallback to generate a ToUnicode-map from the `WinAnsiEncoding`.
2023-01-17 16:04:51 +01:00
Jonas Jenwald
cefaecc2e8 Ensure that Annotation appearance-entries are actually Streams
Note how all over the `src/core/annotation.js`-code we're assuming that if an `appearance`-entry exists it's also a Stream. However, we're not actually checking that thoroughly enough which causes issues in some badly generated PDF documents.
2023-01-16 13:02:53 +01:00
calixteman
fcaeb5db88
Merge pull request #15901 from calixteman/15289_followup
Avoid null ExpansionFactor in type1 fonts (follow-up of #15289)
2023-01-07 18:20:31 +01:00
Jonas Jenwald
74e4b515c5
Merge pull request #15897 from Snuffleupagus/issue-15893
Support parsing encrypted documents in `XRef.indexObjects` (issue 15893)
2023-01-07 16:55:41 +01:00
Calixte Denizet
c170245fc0 Avoid null ExpansionFactor in type1 fonts (follow-up of #15289) 2023-01-07 16:25:24 +01:00
Calixte Denizet
e565e455e2 Set ExpansionFactor to 0.06 when it's equals to 0 in the private dict of CFF fonts 2023-01-07 14:53:13 +01:00
Jonas Jenwald
7d94fdeb48 Support parsing encrypted documents in XRef.indexObjects (issue 15893)
*Please note:* The reduced test-case is *not* a perfect reproduction of the original PDF document, since this one fails to open in e.g. Adobe Reader, but I do believe that it captures the most important points here.

For corrupt *and* encrypted PDF documents, it's possible that only some trailer dictionaries actually contain an /Encrypt-entry. Previously we'd could easily miss that, since we generally pick the first not obviously corrupt trailer dictionary, and the solution implemented here is to simply pre-parse all trailer dictionaries to see if there's any /Encrypt-entries.
2023-01-06 13:09:37 +01:00
Jonas Jenwald
6bdbb5c5ca Update the type/subtype at the end of font parsing
This fixes a warning reported by CodeQL, and should also make general sense given that we parse the font-data to determine the *actual* `type`/`subtype` rather than trusting the PDF document.
2023-01-02 16:21:48 +01:00
Jonas Jenwald
1a69d537c1 [api-minor] Limit the PDFDocumentLoadingTask.onUnsupportedFeature functionality to GENERIC builds (PR 15758 follow-up)
This was deprecated in PR 15758 but it's unfortunately quite difficult to tell if third-party users are depending on this, e.g. to implement custom error reporting, and if so to what extent.
However, thanks to the pre-processor we can limit *most* of this code to GENERIC builds which still seem like a worthwhile change.

These changes reduce the bundle size of the Firefox PDF Viewer by 3.8 kB in total.
2023-01-01 17:53:12 +01:00
Jonas Jenwald
0c1fb4e740 [api-minor] Remove the PDFDocumentProxy.stats getter (PR 15758 follow-up)
This was deprecated in PR 15758 and given that it's quite unlikely that any third-party users are relying on this functionality, since it was only ever added to support telemetry reporting in the Firefox PDF Viewer, it should hopefully be fine to remove this fairly quickly.

These changes reduce the bundle size of the Firefox PDF Viewer by 4.5 kB in total.
2023-01-01 17:06:47 +01:00
Jonas Jenwald
2fcf8bb5be Re-factor searching for incomplete objects in XRef.indexObjects (issue 15803)
When trying to find incomplete objects, i.e. those missing the "endobj"-string at the end, there's unfortunately a number of possible operators that we need to check for. Otherwise we could miss e.g. the "trailer" at the end of a corrupt PDF document, which is why the referenced document didn't work.

Currently we do all searching on the "raw" bytes of the PDF document, for efficiency, however this doesn't really work when we need to check for *multiple* potential command-strings. To keep the complexity manageable we'll instead use regular expressions here, but we can at least avoid creating lots of substrings thanks to the `RegExp.lastIndex` property; which is well supported across browsers according to https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/lastIndex#browser_compatibility

Note that this repeated regular expression usage could perhaps be slightly less efficient than the old code, however this method is only invoked for corrupt PDF documents.
2022-12-19 23:01:09 +01:00
Calixte Denizet
f80880ccaa Strip out a reserved operator (9) from CFF char strings (fixes issue #15784) 2022-12-16 15:17:46 +01:00
Jonas Jenwald
26135b0313 Always parse the entire startXRefQueue in XRef.readXRef (issue 15833)
Previously we'd abort all parsing if an Error was encountered, despite the fact that multiple `startXRefQueue`-entries may be available and that continued parsing could thus eventually be able to find usable data.

Note that in the referenced PDF document the `startxref`-operator, at the end of the file, points to a position in the middle of an arbitrary `stream` which is why things break.
2022-12-15 13:46:28 +01:00
Calixte Denizet
0c1ec946aa [JS] Handle correctly choice widgets where the display and the export values are different (issue #15815) 2022-12-13 19:08:26 +01:00
Jonas Jenwald
5f8598abb7 [api-minor] Normalize the view-getter on the worker-thread
*Please note:* I don't really expect that this is will be an observable change, since virtually all PDF documents already order e.g. /MediaBox and /CropBox entries correctly.

By normalizing boundingBoxes already on the worker-thread, we can be sure that even a corrupt document won't cause issues.
Note how we're passing the `view`-getter to the `PartialEvaluator.getTextContent` method, in order to detect textContent which is outside of the page, hence it makes sense to ensure that it's formatted as expected.
Furthermore, by normalizing this once on the worker-tread we should no longer have to worry about a possibly negative width/height in the `PageViewport` constructor.

Finally, the patch also simplifies the `view`-getter a little bit.
2022-12-02 15:46:39 +01:00
Jonas Jenwald
aa5b678f94 Add default icons for FileAttachment annotations (bug 1230933)
*Please note:* This "borrows" the icons from Thunderbird.

According to the PDF specification, see https://web.archive.org/web/20220309040754if_/https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf#G11.2096626, we should be providing default icons for FileAttachment annotations without appearances.
2022-11-26 11:24:59 +01:00
Jonas Jenwald
4b02610e8c Re-factor and simplify the getQuadPoints helper function
The use of `Array.prototype.reduce()` is, in my opinion, hurting overall readability since it's not particularly easy to look at the relevant code and immediately understand what's going on here. Furthermore this code leads to strictly speaking unnecessary allocations and parsing, since we could just track the min/max values directly in the relevant loop instead.
2022-11-25 10:40:16 +01:00
Jonas Jenwald
d1c01b3164 Add a fallback for non-embedded *composite* Tahoma fonts (issue 15719) 2022-11-23 15:51:18 +01:00
Jonas Jenwald
2ff9799e7a Tweak assignment of common parameters in the Annotation classes
This is slightly more compact, and also unifies the format across the various classes.
2022-11-20 12:29:59 +01:00
Jonas Jenwald
c92de947b6 Reduce duplication when creating a fallback appearance for MarkupAnnotations
Currently we repeat the same color-conversion code verbatim in lots of classes, which seems completely unnecessary.
2022-11-20 12:05:25 +01:00
Tim van der Meij
d6908ee145
Merge pull request #15701 from Snuffleupagus/move-string-helpers
Move some string helper functions to the worker-thread
2022-11-19 11:20:07 +01:00
Jonas Jenwald
70d362f22c Remove an unnecessary variable in getPdfManager, in the src/core/worker.js file
Another tiny piece of clean-up, since adding a `catch`-handler to a Promise shouldn't require an intermediate variable.
2022-11-17 15:31:41 +01:00
Jonas Jenwald
a2a200175f Remove unnecessary function names in the src/core/worker.js file
Currently *some* functions in this file have names while others don't, and in a few cases the names are no longer entirely accurate.
For the relevant functions there should really be no need to name them, and if memory serves this was originally done since browsers (many years ago) didn't always handle anonymous functions correctly in stack traces.
2022-11-17 15:12:48 +01:00
Jonas Jenwald
9adc7859c8 Move the escapeString helper function into the worker-thread
Given that this helper function is only used on the worker-thread, there's no reason to duplicate it in both of the `pdf.js` and `pdf.worker.js` files.
2022-11-16 12:35:48 +01:00
Jonas Jenwald
e5859e145d Move the isAscii helper function into the worker-thread
Given that this helper function is only used on the worker-thread, there's no reason to duplicate it in both of the `pdf.js` and `pdf.worker.js` files.
2022-11-16 12:35:48 +01:00
Jonas Jenwald
2eaa708e3a Combine the stringToUTF16String and stringToUTF16BEString helper functions
Given that these functions are virtually identical, with the latter only adding a BOM, we can combine the two. Furthermore, since both functions were only used on the worker-thread, there's no reason to duplicate this functionality in both of the `pdf.js` and `pdf.worker.js` files.
2022-11-16 12:35:44 +01:00
Jonas Jenwald
f358e76f5b Move the _isOffscreenCanvasSupported property to the base Annotation class
Having just played around with adding FreeText-annotations and then trying to print, there were `FreeTextAnnotation: OffscreenCanvas is not supported, annotation may not render correctly.` messages printed in the console.
The reason for this is that `FreeTextAnnotation` inherits from `MarkupAnnotation`, however only `WidgetAnnotation` actually defines the `_isOffscreenCanvasSupported` property.
2022-11-15 16:30:53 +01:00