Commit Graph

5712 Commits

Author SHA1 Message Date
calixteman
0fca6e187c
Merge pull request #16035 from calixteman/fix_combo_value
[Annotation] A combo can have a value other than one in the options
2023-02-09 19:56:16 +01:00
Jonas Jenwald
1fc8350795
Merge pull request #16032 from Snuffleupagus/less-arrayByteLength
Reduce usage of the `arrayByteLength` helper function
2023-02-09 18:56:20 +01:00
Calixte Denizet
cb1638530d [Annotation] A combo can have a value other than one in the options
When printing the pdf in #12233 in Acrobat, we can see that the combo for country
is empty: it's because the V entry doesn't have to be one of the options.
2023-02-09 18:50:57 +01:00
calixteman
972744a68f
Merge pull request #16033 from calixteman/bug1640217
Ignore position of combining diacritics when getting text (bug 1640217)
2023-02-09 18:23:59 +01:00
calixteman
533a461db0
Merge pull request #16031 from calixteman/bug1770750
[Annotation] For choice widget, use the I entry instead of the V one (bug 1770750)
2023-02-09 18:01:28 +01:00
Calixte Denizet
58e4d92884 [Annotation] For choice widget, use the I entry instead of the V one (bug 1770750)
It isn't really conform to the specifications but Acrobat is working like that...
2023-02-09 17:26:13 +01:00
Calixte Denizet
4e9f26afa3 Ignore position of combining diacritics when getting text (bug 1640217) 2023-02-09 17:13:57 +01:00
Jonas Jenwald
96d338e437 Reduce usage of the arrayByteLength helper function
We're using this helper function when reading data from the [`PDFWorkerStreamReader.read`](a49d1d1615/src/core/worker_stream.js (L90-L98)) and [`PDFWorkerStreamRangeReader.read`](a49d1d1615/src/core/worker_stream.js (L122-L128)) methods, and as can be seen they always return `ArrayBuffer` data. Hence we can simply get the `byteLength` directly, and don't need to use the helper function.

Note that at the time when `arrayByteLength` was added we still supported browsers without TypedArray functionality, and we'd then simulate them using regular Arrays.
2023-02-09 15:50:38 +01:00
Jonas Jenwald
323d3d246a Re-factor the readChunk function in ChunkedStreamManager.sendRequest
Move the `done` branch to the top of the function, similar to how we usually format things when `ReadableStream`s are used.
2023-02-09 15:33:06 +01:00
Jonas Jenwald
9d29abdfa0 Change the LoopbackPort class to use a Set internally
This is a tiny bit more compact, thanks to the `Set.prototype.delete` method.
2023-02-09 12:34:41 +01:00
Calixte Denizet
c92ba393c2 Fix pinch-to-zoom on mac for the Firefox builtin viewer
In the mac case we don't want to care about the scaleFactor threshold
because else if too big another move could start and then subsequent
events aren't considered as wheel events.
It isn't really ideal and at some point we'll need to find a way at
least for the Firefox case to get the real events instead of the fake
wheel ones.
2023-02-08 15:04:41 +01:00
Calixte Denizet
a25895bf72 [Annotation] Take into account the stroke alpha for a FreeText without appearance 2023-02-07 22:15:27 +01:00
Calixte Denizet
ea7b4b4d6c [Annotation] Avoid to encrypt the appearance stream two times (bug 1815476) 2023-02-07 19:26:46 +01:00
Jonas Jenwald
0a0f3fc733 Move the main-thread CMap/StandardFontData factory initialization to getDocument
By default we're using worker-thread fetching (in browsers) of this data nowadays, however in Node.js environments or if the user provides custom factories we still fallback to main-thread fetching.
Hence it makes sense, as far as I'm concerned, to move this initialization into the `getDocument` function to ensure that the factories can actually be initialized *before* attempting to load the document.

Also, this further reduces the amount of `getDocument` parameters that we need to pass into into the `WorkerTransport` class.
2023-02-05 11:52:35 +01:00
Jonas Jenwald
ce8ac6d96a Only pass the necessary parameters to _fetchDocument and WorkerTransport
Currently we're passing all available parameters to this function respectively class, despite that not actually being necessary.
By splitting the parameters we not only improve the structure, and basically "document" the code a little bit, but we can also simplify the `_fetchDocument` function considerably.
2023-02-05 11:52:33 +01:00
Jonas Jenwald
512aa50fdd Re-factor the parameter parsing/validation in getDocument
This is very old code, where we loop through the user-provided options and build an internal parameter object. To prevent errors we also need to ensure that the parameters are correct/valid, which is especially important for the ones that are sent to the worker-thread such that structured cloning won't fail.[1]

Over the years this has led to more and more code being added in `getDocument` to validate the user-provided options, and at this point *most* of them have at least basic validation. However the way that this is implemented feels slightly backwards, since we first build the internal parameter object and only *afterwards* validate those parameters.[2]

Hence this patch changes the `getDocument` function to instead check/validate the supported options upfront, and then *explicitly* build the internal parameter object with only the needed properties.

---
[1] Note the supported types at https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers_API/Structured_clone_algorithm#supported_types

[2] The internal parameter object may also, because of the loop, end up with lots of unnecessary properties since anything that the user provides is being copied.
2023-02-05 11:52:25 +01:00
Tim van der Meij
e698664927
Merge pull request #16004 from Snuffleupagus/WorkerTransport-cacheSimpleMethod
Improve how we cache Promises in `WorkerTransport`
2023-02-04 15:13:12 +01:00
Tim van der Meij
b75dafba87
Merge pull request #15987 from Snuffleupagus/onOpenWithTransport-params
Remove unused parameters from the `onOpenWithTransport` method in `PDFViewerApplication.initPassiveLoading`
2023-02-04 15:07:42 +01:00
Tim van der Meij
e848a0e61c
Merge pull request #15981 from Snuffleupagus/cMapPacked-true
[api-minor] Let the `cMapPacked` parameter, in `getDocument`, default to `true`
2023-02-04 15:00:26 +01:00
Jonas Jenwald
3a7fce49a3 A tiny improvement of the MetadataParser._repair method
We can just insert the initial greater-than sign at the start of the buffer, rather than doing that manually at the end.
2023-02-04 12:43:55 +01:00
Jonas Jenwald
2de03a7d91 Improve how we cache Promises in WorkerTransport
A number of methods have their Promises cached, to avoid repeated worker round-trips, since they're expected to be called more than once from the default viewer. The way that the caching is currently implemented means that we need to remember to manually clear these Promises on document cleanup/destruction, and it'd be nice to avoid that.

With this patch the relevant Promises are now instead placed in just one `Map`, which is easy to clear, and a new helper method is also introduced to reduce duplication for *simple* `WorkerTransport` methods.
2023-02-04 11:57:37 +01:00
Calixte Denizet
185281957d [Editor] Make the annotation editor layer invisible when disabled and empty
It'll help to avoid to consider them when the browser is restyling.
2023-02-01 17:53:44 +01:00
Jonas Jenwald
cf8ee47589 Remove unused parameters from the onOpenWithTransport method in PDFViewerApplication.initPassiveLoading
The only parameter that we actually need here is the `PDFDataRangeTransport`-instance, since the others are not necessary.
 - The `url` parameter, as passed to the `getDocument` function in the API, is simply being ignored; see 2d87a2eb1c/src/display/api.js (L447-L458)
 - The `length` parameter, as passed to the `getDocument` function in the API, is always being overwritten; see 2d87a2eb1c/src/display/api.js (L519-L525)
2023-02-01 09:33:22 +01:00
Jonas Jenwald
5e88228767 Allow, optionally, using worker-modules during local development
Until PR 12563 is deemed safe to land, I'd still like to be able to use worker-modules in the viewer during local development.
Hence this patch which *temporarily* adds a new `workerModules` hash-parameter, only available in non-PRODUCTION mode, that allows using worker-modules in the development viewer.

To enable this functionality, simply use http://localhost:8888/web/viewer.html#workerModules=true
2023-01-31 12:09:44 +01:00
Jonas Jenwald
c5d6391898 [api-minor] Let the cMapPacked parameter, in getDocument, default to true
The initial CMap support was added in PR 4259 using the "raw" Adobe files, however they were quickly deemed to be unnecessarily large. As a result PR 4470 introduced the more compact "binary" CMap format, with both of those PRs being included in the very same release (version `0.8.1334`) .

Please note that we've thus never shipped anything *except* the "binary" CMap files with the PDF library, and furthermore note that we've not even once updated the CMap files since they were originally added almost nine years ago.

Requiring users to remember that `cMapPacked = true` is necessary, in addition to setting the `cMapUrl` parameter, in order for CMap loading to work feels like a less than ideal API.
Hence this patch, which suggests that we simply let `cMapPacked` default to `true` now.
2023-01-30 15:35:02 +01:00
Jonas Jenwald
808ca828f1 Extend getGlyphMapForStandardFonts with additional entries (issue 15977) 2023-01-30 12:13:21 +01:00
Tim van der Meij
ee3be2f979
Merge pull request #15951 from Snuffleupagus/polyfill-Path2D
Polyfill `Path2D` in Node.js environments
2023-01-28 19:06:54 +01:00
Tim van der Meij
e539d2da1e
Merge pull request #15964 from Snuffleupagus/getDocument-non-object
Only accept non-objects passed to `getDocument` in GENERIC builds
2023-01-28 18:42:09 +01:00
Jonas Jenwald
cf0369d622 Polyfill Path2D in Node.js environments
Until just recently the only existing `Path2D` polyfill didn't have support for Node.js and/or the `node-canvas` package. Given that this was just fixed, in the latest version, we can now finally remove our inline-checks at the relevant call-sites; please also see https://github.com/nilzona/path2d-polyfill#usage-with-node-canvas
2023-01-28 18:28:22 +01:00
Tim van der Meij
cb5a28ceca
Merge pull request #15954 from Snuffleupagus/getDocument-URL-tweaks
Tweak the internal handling of the `url`-parameter in `getDocument` (PR 13166 follow-up)
2023-01-28 18:18:17 +01:00
Tim van der Meij
16aef95937
Merge pull request #15966 from Snuffleupagus/GlobalWorkerOptions-defaults
Simplify setting the `GlobalWorkerOptions` default values (PR 9480 follow-up)
2023-01-28 18:15:32 +01:00
Calixte Denizet
6f4d037a8e [JS] Correctly format field with numbers (bug 1811694, bug 1811510)
In PR #15757, a value is automatically converted into a number when it's possible
but the case of numbers like "000123" has been overlooked and their format must
be preserved.
When a script is doing something like "foo.value + bar.value" and the values are
numbers then "foo.value" must return a number but the displayed value must be what
the user entered or what a script set, so this patch is just adding a a field
_orginalValue in order to track the value has it has defined.
Some people are used to use a comma as decimal separator, hence it must be considered
when a value is parsed into a number.
This patch is fixing a regression introduced by #15757.
2023-01-26 14:57:02 +01:00
Jonas Jenwald
1c4af2727c Simplify setting the GlobalWorkerOptions default values (PR 9480 follow-up)
There's really no need for these "complicated" default value assignments, since `GlobalWorkerOptions` is a local variable at this point, and this is rather a case of too much copy-and-paste.
Note that years ago, when all options were set using a global `PDFJS` object, it's possible that options had been set (from the outside) *before* the object had been properly initialized; see e.g. a89071bdef/src/display/global.js
2023-01-26 14:16:01 +01:00
Jonas Jenwald
4758e6649c Only accept non-objects passed to getDocument in GENERIC builds
In general it's always recommended to pass a *parameter object* when calling the `getDocument`-function in the API, since that's the only way to provide additional options, and the fact that it also accepts a URL or TypedArray directly is now mostly for backwards compatibility reasons.
Unfortunately we cannot really remove this, since that code has existed since "forever", however we can limit it to only the GENERIC build to avoid completely unnecessary checks in e.g. the Firefox PDF Viewer.

Finally, note that the default-viewer always provides a *parameter object* when calling the `getDocument`-function and it's thus completely unaffected by these changes.
2023-01-26 10:48:58 +01:00
Jonas Jenwald
755319130e Tweak the internal handling of the url-parameter in getDocument (PR 13166 follow-up)
- Use a `URL`-instance directly, since it's by definition an absolute URL.
 - Actually limit the "raw" url-string handling to Node.js environments, as intended.
 - Skip the warning, since we're already throwing an Error if the `url`-parameter is invalid.
2023-01-24 11:18:41 +01:00
Tim van der Meij
edfdb693e5
Merge pull request #15948 from Snuffleupagus/bug-1811668
Tweak `adjustType1ToUnicode` for fonts with a predefined *named* encoding (bug 1811668, PR 14050 follow-up)
2023-01-21 14:02:40 +01:00
Tim van der Meij
a27d7ba524
Merge pull request #15943 from Snuffleupagus/deprecate-direct-PDFDataRangeTransport
[api-minor] Deprecate calling `getDocument` directly with a `PDFDataRangeTransport`-instance
2023-01-21 13:50:20 +01:00
Jonas Jenwald
40a46e4397 Tweak adjustType1ToUnicode for fonts with a predefined *named* encoding (bug 1811668, PR 14050 follow-up)
*Please note:* I cannot reproduce the problem reported in bug 1811668, regarding the context menu, and in any case it's not clear that that part is even a PDF Viewer bug.

Looking at bug 1811668 I couldn't help but noticing that the textLayer isn't correct, and it's unfortunately once again a problem with the `adjustType1ToUnicode` function. That's intended to help improve text-selection for fonts without a /ToUnicode-entry, and in many cases it does help (the original PR fixed lots of issues) however it's also caused some problems.

In order to improve text-selection in bug 1811668, we'll now properly ignore fonts that have a predefined *named* encoding specified since that's really the intention with PR 14050.
2023-01-21 12:21:21 +01:00
Jonas Jenwald
f2fce93826 [JBIG2] Ensure that the decodeInteger function returns valid integers (issue 15942)
The JBIG2 images in this PDF document are corrupt enough that even Adobe Reader warns about it when opening the file.
*Please note:* I don't really know the JBIG2 image format at all, however from a very brief look at the specification it seems that integers should be 32-bit.
2023-01-19 17:14:17 +01:00
Jonas Jenwald
7976fc7851 [api-minor] Deprecate calling getDocument directly with a PDFDataRangeTransport-instance
In general it's recommended to pass a *parameter object* when calling the `getDocument`-function in the API, since that's the only way to provide additional options, and the fact that it also accepts a URL or TypedArray directly is now mostly for backwards compatibility reasons.
However, the `getDocument`-function also accepts a direct `PDFDataRangeTransport`-instance which just seems unnecessary.

*Please note:* The `PDFDataRangeTransport`-implementation was added specifically for the *built-in* Firefox PDF Viewer, however it's most likely not commonly used by any third-party (given that it requires manual PDF-data loading).
Furthermore, the default-viewer always provides a *parameter object* when calling the `getDocument`-function and it's thus completely unaffected by these changes.
2023-01-19 14:25:55 +01:00
Jonas Jenwald
7b36686fca Update the year in the license_header files 2023-01-18 22:28:18 +01:00
Jonas Jenwald
d6be5141e9 Fallback to using the name table to infer the encoding for TrueType fonts missing such data (issue 15910)
The relevant TrueType font is missing both /ToUnicode *and* /Encoding entires, either of which would have prevented the (current) broken textLayer rendering.
My first idea was that we could use the `post` table in the TrueType font, see https://developer.apple.com/fonts/TrueType-Reference-Manual/RM06/Chap6post.html, to get the actual glyphNames and amend the fallback ToUnicode-map that way. Unfortunately that didn't work, since the `post` table only contained ".notdef" and "" (i.e. empty string) entries.

Instead we try to use the `name` table in the TrueType font, see https://developer.apple.com/fonts/TrueType-Reference-Manual/RM06/Chap6name.html, to determine if the platform is Windows and thus fallback to generate a ToUnicode-map from the `WinAnsiEncoding`.
2023-01-17 16:04:51 +01:00
Jonas Jenwald
d8d5545e03
Merge pull request #15926 from Snuffleupagus/annotation-appearance-stream
Ensure that Annotation `appearance`-entries are actually Streams
2023-01-16 15:00:12 +01:00
Jonas Jenwald
cefaecc2e8 Ensure that Annotation appearance-entries are actually Streams
Note how all over the `src/core/annotation.js`-code we're assuming that if an `appearance`-entry exists it's also a Stream. However, we're not actually checking that thoroughly enough which causes issues in some badly generated PDF documents.
2023-01-16 13:02:53 +01:00
Jonas Jenwald
397f943ca3 [api-minor] Enable transferring of TypedArray PDF data by default (PR 15908 follow-up)
This patch removes the recently introduced `transferPdfData` API-option, and simply enables transferring of TypedArray data *by default* instead of copying it. This will help reduce main-thread memory usage, however it will take ownership of the TypedArrays. Currently this only applies to the following cases:
 - TypedArrays passed to the `getDocument`-function in the API, in order to open PDF documents from binary data.
 - TypedArrays passed to a `PDFDataRangeTransport`-instance, used to support custom PDF document fetching/loading (see e.g. the Firefox PDF Viewer).

*PLEASE NOTE:* To avoid being affected by this, please simply *copy* any TypedArray data before passing it to either of the functions/methods mentioned above.

Now that we transfer TypedArray data that we previously only copied, we need to be more careful with input validation. Given how the `{IPDFStreamReader, IPDFStreamRangeReader}.read` methods will always return ArrayBuffer data, which is then transferred to the worker-thread[1], the actual TypedArray data passed to the API thus need to have the same exact size as its underlying ArrayBuffer to prevent issues.
Hence we'll check for this and only allow transferring of *safe* TypedArray data, and fallback to simply copying the data just as before. This obviously shouldn't be an issue in the Firefox PDF Viewer, but for the general PDF.js library we need to be more careful here.

---
[1] See e09ad99973/src/display/api.js (L2492-L2506) respectively e09ad99973/src/display/api.js (L2578-L2590)
2023-01-14 10:39:36 +01:00
Jonas Jenwald
99cfab18c1 Combine the array-like and ArrayBuffer branches, when handling binary data, in getDocument 2023-01-13 13:28:44 +01:00
Jonas Jenwald
e09ad99973
Merge pull request #15916 from Snuffleupagus/fetch-transfer
[api-minor] Enabling transferring of data fetched with the `PDFFetchStream` implementation
2023-01-13 13:28:12 +01:00
Jonas Jenwald
1362cd91d0 Improve input validation in PDFDataTransportStream._onReceiveData (PR 15908 follow-up)
The mozilla-central [method `PdfDataListener.readData`](https://searchfox.org/mozilla-central/rev/893a8f062ec6144c84403fbfb0a57234418b89cf/toolkit/components/pdfjs/content/PdfStreamConverter.jsm#207-210) can return `null`, hence it seems like a very good idea to update `PDFDataTransportStream._onReceiveData` to handle that gracefully since the current code will throw in that case.

Also, improves the JSDocs for the `PDFDataRangeTransport` class in the API.
2023-01-12 15:24:59 +01:00
Jonas Jenwald
cee97fcd15 [api-minor] Enabling transferring of data fetched with the PDFFetchStream implementation
Note how in the API we're transferring the PDF data that's fetched over the network[1]:
 - f28bf23a31/src/display/api.js (L2467-L2480)
 - f28bf23a31/src/display/api.js (L2553-L2564)

To support that functionality we have the `PDFDataTransportStream`, `PDFFetchStream`, `PDFNetworkStream`, and `PDFNodeStream` implementations. Here these stream-implementations vary slightly in how they handle `ArrayBuffer`s internally, w.r.t. transferring or copying the data:
 - In `PDFDataTransportStream` we optionally, after PR 15908, allow transferring of the PDF data as provided externally (used e.g. in the Firefox PDF Viewer).
 - In `PDFFetchStream` we're currenly always copying the PDF data returned by the Fetch API, which seems unnecessary. As discussed in PR 15908, it'd seem very weird if this sort of browser API didn't allow transferring of the returned data.
 - In `PDFNetworkStream` we're already, since many years, transferring the PDF data returned by the `XMLHttpRequest` functionality. Note how the `getArrayBuffer` helper function simply returns an `ArrayBuffer` response as-is.
 - In `PDFNodeStream` we're currently copying the PDF data, however this is unfortunately necessary since Node.js returns data as a `Buffer` object[2].

Given that the `PDFNetworkStream` has been, indirectly, supporting transferring of PDF data for years it would seem really strange if this didn't also apply to the `PDFFetchStream`-implementation.
Hence this patch simply enables transferring of PDF data, when accessed using the Fetch API, unconditionally to help reduced main-thread memory usage since the `PDFFetchStream`-implementation is used *by default* in browsers (for the GENERIC build).

---
[1] As opposed to PDF data being provided as e.g. a TypedArray when calling `getDocument` in the API.

[2] This is a "special" Node.js object, see https://nodejs.org/api/buffer.html#buffer, which doesn't exist in browsers.
2023-01-12 13:59:21 +01:00
Jonas Jenwald
bbe629018d [api-minor] Add a new transferPdfData option to allow transferring more data to the worker-thread (bug 1809164)
Also, removes the `initialData`-parameter JSDocs for the `getDocument`-function given that this parameter has been completely unused since PR 8982 (over five years ago). Note that the `initialData`-parameter is, and always was, intended to be provided when initializing a `PDFDataRangeTransport`-instance.
2023-01-10 21:03:44 +01:00