Commit Graph

1064 Commits

Author SHA1 Message Date
Jonas Jenwald
d4bcfe8c16 Support multi-byte ToUnicode entries, when using predefined CMaps (issue 16176)
Hopefully this makes sense, since we already "create" multi-byte ToUnicode entries in other cases (see e.g. the `getNormalizedUnicodes` table).
2023-03-21 21:35:57 +01:00
Jonas Jenwald
c4a725fe98 Fix the transfer parameter, for structuredClone, in the LoopbackPort
The way that we handle the `transfer` parameter is unfortunately wrong, ever since PR 14392 which introduced the code, given that the MDN article originally contained incorrect information; please see https://github.com/mdn/content/pull/23164

By updating the `structuredClone` call such that it works correctly, we can enable more unit-tests in Node.js environments; please refer to https://developer.mozilla.org/en-US/docs/Web/API/structuredClone#parameters
2023-03-19 22:04:01 +01:00
Jonas Jenwald
0e54a3c37a Warn about missing/incorrect --scale-factor CSS-variable in renderTextLayer (issue 16139)
Unfortunately I don't believe that we can simply add a default `--scale-factor` CSS-variable to the `container`-element, since that might not be entirely appropriate/correct in all cases.[1]
However, we can at least print a console-error to hopefully make this situation more apparent to users. (This is purposely not using the `warn` helper-function, since those messages can be disabled.)

---
[1] One example is in our reference-tests, where we don't need to add it to the `container`-element itself.
2023-03-16 11:53:12 +01:00
calixteman
b2a86350fc
Merge pull request #16096 from bungeman/fix_trig_functions
Correct PostScript trigonometric operators
2023-03-11 14:32:23 +01:00
Calixte Denizet
07b094729e Fix search in pdf a containing some UTF-32 characters (bug 1820909)
Some chars were supposed to have a length equals to 1 but UTF-32 chars
can be longuer.
2023-03-09 15:03:01 +01:00
Calixte Denizet
b8dda089e2 Slightly modify the max width of a tracking space 2023-03-07 19:38:49 +01:00
Ben Wagner
158c836e26 Correct PostScript trigonometric operators
PDF 32000-1:2008 7.10.5.1 "Type 4 (PostScript Calculator) Functions"
defers to the PostScript Language Reference for the description of these
functions. The PostScript Language Reference, third edition chapter 8
"Operators" defines the `angle` type as a "number of degrees". Section
8.1 defines "angle `sin` real", "angle `cos` real", and "num den `atan`
angle". The documentation for `atan` further states that it will return
an angle in degrees between 0 and 360.

Handle these operators correctly in `PostScriptEvaluator.execute`.
Convert the inputs to `sin` and `cos` from degrees to radians for use
with `Math.sin` and `Math.cos`. Correctly pop two values from the stack
for `atan`, use `Math.atan2`, and convert from radians to (positive)
degrees.
2023-03-03 17:25:11 -05:00
Calixte Denizet
fd03cd5493 [api-minor] Generate images in the worker instead of the main thread.
We introduced the use of OffscreenCanvas in #14754 and this patch aims
to use them for all kind of images.
It'll slightly improve performances (and maybe slightly decrease memory use).
Since an image can be rendered in using some transfer maps but because of
OffscreenCanvas we don't have the underlying pixels array the transfer maps
stuff is re-implemented in using the SVG filter feComponentTransfer.
2023-03-01 17:40:12 +01:00
Jonas Jenwald
f42a2e8451
[api-minor] Move the canvasFactory option into getDocument
Rather than repeatedly initializing a `canvasFactory`-instance for every page, move it to the document-level instead.

*Please note:* This patch is written using the GitHub UI, since I'm currently without a dev machine, so hopefully it works correctly.
2023-03-01 09:07:16 +01:00
Calixte Denizet
3a21423386 [Acroform] Use the full path to find the node in the XFA datasets where to store the value
I noticed several 'Path not found' errors because of a field called #subform[2].
From the XFA specs, the hash is used for a class of elements in the template tree.
When we're looking for a node in the datasets tree, it doesn't make sense to search
for a class. Hence the path element starting with a hash are just skipped.
2023-02-23 12:09:39 +01:00
Calixte Denizet
fc7d74385f Don't replace an eol by a whitespace when the last char is a Katakana-Hiragana diacritic 2023-02-16 11:31:58 +01:00
Jonas Jenwald
6d4d402a78 Move the arrayBuffersToBytes helper function into the worker-thread
Given that this helper function is only used on the worker-thread, there's no reason to duplicate it in both of the *built* `pdf.js` and `pdf.worker.js` files.
2023-02-11 21:34:37 +01:00
Calixte Denizet
4e9f26afa3 Ignore position of combining diacritics when getting text (bug 1640217) 2023-02-09 17:13:57 +01:00
Jonas Jenwald
90ffbc1d39 Remove most build-time require statements from the viewer (PR 16009 follow-up)
This further extends the web-specific import maps introduced in PR 16009, to allow removing *most* of the build-time `require` statements from the viewer. The few remaining ones are fallbacks used for the COMPONENTS respectively the `legacy` GENERIC builds.
2023-02-07 22:45:19 +01:00
calixteman
ecd86ccffc
Merge pull request #16020 from calixteman/bug1815476
[Annotation] Avoid to encrypt the appearance stream two times (bug 1815476)
2023-02-07 20:57:49 +01:00
Calixte Denizet
ea7b4b4d6c [Annotation] Avoid to encrypt the appearance stream two times (bug 1815476) 2023-02-07 19:26:46 +01:00
Jonas Jenwald
a98e80c4ff [GeckoView] Reduce the size of the *built* viewer
Given that the GV-viewer isn't using most of the UI-related components of the default-viewer, we can avoid including them in the *built* viewer to save space.[1]
The least "invasive" way of implementing this, at least that I could come up with, is to leverage import maps with suitable stubs for the GV-viewer.

The one slightly annoying thing is that we now have larger import maps across multiple html-files, and you'll need to remember to update all of them when making future changes.

---
[1] With this patch, the built `viewer.js` size is 391 kB and `viewer-geckoview.js` is 285 kB.
2023-02-05 14:12:32 +01:00
Tim van der Meij
e848a0e61c
Merge pull request #15981 from Snuffleupagus/cMapPacked-true
[api-minor] Let the `cMapPacked` parameter, in `getDocument`, default to `true`
2023-02-04 15:00:26 +01:00
Jonas Jenwald
851c394e64 Remove the isEmptyObj unit-test helper function
We should be able to let Jasmine simply compare directly against an actually empty Object, rather than using a manually implemented helper function for that.
2023-02-04 12:43:53 +01:00
Jonas Jenwald
c5d6391898 [api-minor] Let the cMapPacked parameter, in getDocument, default to true
The initial CMap support was added in PR 4259 using the "raw" Adobe files, however they were quickly deemed to be unnecessarily large. As a result PR 4470 introduced the more compact "binary" CMap format, with both of those PRs being included in the very same release (version `0.8.1334`) .

Please note that we've thus never shipped anything *except* the "binary" CMap files with the PDF library, and furthermore note that we've not even once updated the CMap files since they were originally added almost nine years ago.

Requiring users to remember that `cMapPacked = true` is necessary, in addition to setting the `cMapUrl` parameter, in order for CMap loading to work feels like a less than ideal API.
Hence this patch, which suggests that we simply let `cMapPacked` default to `true` now.
2023-01-30 15:35:02 +01:00
Calixte Denizet
6f4d037a8e [JS] Correctly format field with numbers (bug 1811694, bug 1811510)
In PR #15757, a value is automatically converted into a number when it's possible
but the case of numbers like "000123" has been overlooked and their format must
be preserved.
When a script is doing something like "foo.value + bar.value" and the values are
numbers then "foo.value" must return a number but the displayed value must be what
the user entered or what a script set, so this patch is just adding a a field
_orginalValue in order to track the value has it has defined.
Some people are used to use a comma as decimal separator, hence it must be considered
when a value is parsed into a number.
This patch is fixing a regression introduced by #15757.
2023-01-26 14:57:02 +01:00
Tim van der Meij
a27d7ba524
Merge pull request #15943 from Snuffleupagus/deprecate-direct-PDFDataRangeTransport
[api-minor] Deprecate calling `getDocument` directly with a `PDFDataRangeTransport`-instance
2023-01-21 13:50:20 +01:00
Calixte Denizet
dc94b750de [GV] Avoid to update the finder when the results aren't complete
At the beginning of a search we can an update can be triggered with 0 over 0
found matches.
In the GeckoView context, we can't update the finder whenever we want but only
when it has been required.
2023-01-20 18:13:16 +01:00
Jonas Jenwald
7976fc7851 [api-minor] Deprecate calling getDocument directly with a PDFDataRangeTransport-instance
In general it's recommended to pass a *parameter object* when calling the `getDocument`-function in the API, since that's the only way to provide additional options, and the fact that it also accepts a URL or TypedArray directly is now mostly for backwards compatibility reasons.
However, the `getDocument`-function also accepts a direct `PDFDataRangeTransport`-instance which just seems unnecessary.

*Please note:* The `PDFDataRangeTransport`-implementation was added specifically for the *built-in* Firefox PDF Viewer, however it's most likely not commonly used by any third-party (given that it requires manual PDF-data loading).
Furthermore, the default-viewer always provides a *parameter object* when calling the `getDocument`-function and it's thus completely unaffected by these changes.
2023-01-19 14:25:55 +01:00
Jonas Jenwald
8f3fa18c93
Merge pull request #15920 from Snuffleupagus/transfer-pdf-data
[api-minor] Enable transferring of TypedArray PDF data by default (PR 15908 follow-up)
2023-01-16 13:20:57 +01:00
Tim van der Meij
9e3adb5ec7
Implement unit tests for the numberToString utility function 2023-01-14 15:09:58 +01:00
Tim van der Meij
a6dfcc89fa
Implement unit tests for the recoverJsURL utility function 2023-01-14 15:09:58 +01:00
Jonas Jenwald
397f943ca3 [api-minor] Enable transferring of TypedArray PDF data by default (PR 15908 follow-up)
This patch removes the recently introduced `transferPdfData` API-option, and simply enables transferring of TypedArray data *by default* instead of copying it. This will help reduce main-thread memory usage, however it will take ownership of the TypedArrays. Currently this only applies to the following cases:
 - TypedArrays passed to the `getDocument`-function in the API, in order to open PDF documents from binary data.
 - TypedArrays passed to a `PDFDataRangeTransport`-instance, used to support custom PDF document fetching/loading (see e.g. the Firefox PDF Viewer).

*PLEASE NOTE:* To avoid being affected by this, please simply *copy* any TypedArray data before passing it to either of the functions/methods mentioned above.

Now that we transfer TypedArray data that we previously only copied, we need to be more careful with input validation. Given how the `{IPDFStreamReader, IPDFStreamRangeReader}.read` methods will always return ArrayBuffer data, which is then transferred to the worker-thread[1], the actual TypedArray data passed to the API thus need to have the same exact size as its underlying ArrayBuffer to prevent issues.
Hence we'll check for this and only allow transferring of *safe* TypedArray data, and fallback to simply copying the data just as before. This obviously shouldn't be an issue in the Firefox PDF Viewer, but for the general PDF.js library we need to be more careful here.

---
[1] See e09ad99973/src/display/api.js (L2492-L2506) respectively e09ad99973/src/display/api.js (L2578-L2590)
2023-01-14 10:39:36 +01:00
Jonas Jenwald
bbe629018d [api-minor] Add a new transferPdfData option to allow transferring more data to the worker-thread (bug 1809164)
Also, removes the `initialData`-parameter JSDocs for the `getDocument`-function given that this parameter has been completely unused since PR 8982 (over five years ago). Note that the `initialData`-parameter is, and always was, intended to be provided when initializing a `PDFDataRangeTransport`-instance.
2023-01-10 21:03:44 +01:00
Jonas Jenwald
1d5de9f4f4 Inline the setPDFNetworkStreamFactory functionality in src/display/api.js
Given that this is internal functionality, not exposed in the official API, it's not entirely clear (at least to me) why we can't just initialize this directly in `src/display/api.js` instead.
When testing both the development viewer and all the ways in which we run tests, everthing still appears to work just fine with this patch.
2023-01-06 13:23:07 +01:00
Calixte Denizet
661f425934 [GV] Add an option in the find controller to update matches count only when the last page is reached (bug 1803188).
In GeckoView, on an event, a callback must be executed with the result of an action,
but the callback can be used only one time.
So for each FindInPage event, we must trigger only one matches count update.
2023-01-06 10:56:26 +01:00
Jonas Jenwald
42aa08563b
Merge pull request #15880 from Snuffleupagus/rm-docStats
[api-minor] Remove the `PDFDocumentProxy.stats` getter (PR 15758 follow-up)
2023-01-02 16:03:53 +01:00
Calixte Denizet
69c88477a9 Avoid an infinite loop when searching for a single diacritic 2023-01-02 12:27:07 +01:00
Jonas Jenwald
0c1fb4e740 [api-minor] Remove the PDFDocumentProxy.stats getter (PR 15758 follow-up)
This was deprecated in PR 15758 and given that it's quite unlikely that any third-party users are relying on this functionality, since it was only ever added to support telemetry reporting in the Firefox PDF Viewer, it should hopefully be fine to remove this fairly quickly.

These changes reduce the bundle size of the Firefox PDF Viewer by 4.5 kB in total.
2023-01-01 17:06:47 +01:00
Jonas Jenwald
91524d1a60 [api-minor] Allow specifying an extra-delay, in RenderTask.cancel, for worker-thread aborting of operatorList parsing
This is done to support upcoming viewer-changes, and in order to prevent third-party users from outright breaking things we'll simply ignore too large values.
2022-12-14 12:34:16 +01:00
Jonas Jenwald
fe8fded23b [api-minor] Combine the textContent/textContentStream parameters
Rather than handling these parameters separately, which is a left-over from back when streaming of textContent was originally added, we can simply pass either data directly to the `TextLayer` and let it handle things accordingly.

Also, improves a few JSDoc comments and `typedef`-imports.
2022-12-04 21:22:14 +01:00
Tim van der Meij
99cfef882f
Merge pull request #15752 from Snuffleupagus/no-typeof-undefined
Enable the `no-typeof-undefined` ESLint plugin rule
2022-12-02 19:48:16 +01:00
Calixte Denizet
eed9bf71c5 Refactor the text layer code in order to avoid to recompute it on each draw
The idea is just to resuse what we got on the first draw.
Now, we only update the scaleX of the different spans and the other values
are dependant of --scale-factor.
Move some properties in the CSS in order to avoid any updates in JS.
2022-12-01 18:42:43 +01:00
Jonas Jenwald
47dbfc4ade Enable the no-typeof-undefined ESLint plugin rule
Please see https://github.com/sindresorhus/eslint-plugin-unicorn/blob/main/docs/rules/no-typeof-undefined.md
2022-12-01 18:20:39 +01:00
Calixte Denizet
ea1995991b Don't add an extra space after a Katakana or a Hiragana at the eol when searching 2022-11-29 10:46:48 +01:00
Calixte Denizet
ae7da6ae48 [JS] By default, a text field value must be treated as a number (bug 1802888) 2022-11-28 16:24:01 +01:00
Calixte Denizet
4ee0c83548 [JS] Fix a rounding issue in printf (bug 1802888) 2022-11-28 14:37:15 +01:00
Jonas Jenwald
0ba242ea4a Support FileAttachments with hash-signs in the filename (issue 15729)
The reason for the issue is that we use the generic `getFilenameFromUrl` helper function, which was originally intended for regular URLs.
For the filenames we're dealing with in FileAttachments, we really only want to strip the path when one exists[1].

---
[1] See [bug 1230933](https://bugzilla.mozilla.org/show_bug.cgi?id=1230933) for an example of such a case.
2022-11-23 10:47:33 +01:00
Jonas Jenwald
7d029f8bfe Add a basic stringToUTF16HexString unit-test 2022-11-16 12:39:35 +01:00
Jonas Jenwald
9adc7859c8 Move the escapeString helper function into the worker-thread
Given that this helper function is only used on the worker-thread, there's no reason to duplicate it in both of the `pdf.js` and `pdf.worker.js` files.
2022-11-16 12:35:48 +01:00
Jonas Jenwald
e5859e145d Move the isAscii helper function into the worker-thread
Given that this helper function is only used on the worker-thread, there's no reason to duplicate it in both of the `pdf.js` and `pdf.worker.js` files.
2022-11-16 12:35:48 +01:00
Jonas Jenwald
2eaa708e3a Combine the stringToUTF16String and stringToUTF16BEString helper functions
Given that these functions are virtually identical, with the latter only adding a BOM, we can combine the two. Furthermore, since both functions were only used on the worker-thread, there's no reason to duplicate this functionality in both of the `pdf.js` and `pdf.worker.js` files.
2022-11-16 12:35:44 +01:00
Calixte Denizet
2be64d63e1 Normalize fullwidth, halfwidth and circled chars when searching 2022-11-14 19:27:51 +01:00
Calixte Denizet
3ca03603c2 [Annotation] Fix printing/saving for annotations containing some non-ascii chars and with no fonts to handle them (bug 1666824)
- For text fields
 * when printing, we generate a fake font which contains some widths computed thanks to
   an OffscreenCanvas and its method measureText.
   In order to avoid to have to layout the glyphs ourselves, we just render all of them
   in one call in the showText method in using the system sans-serif/monospace fonts.
 * when saving, we continue to create the appearance streams if the fonts contain the char
   but when a char is missing, we just set, in the AcroForm dict, the flag /NeedAppearances
   to true and remove the appearance stream. This way, we let the different readers handle
   the rendering of the strings.
- For FreeText annotations
  * when printing, we use the same trick as for text fields.
  * there is no need to save an appearance since Acrobat is able to infer one from the
    Content entry.
2022-11-10 19:05:39 +01:00
Samuel Yuan
36fb5c1e2b Propagate the translated font name to TextContentItems.
This allows font data for system fonts to be looked up in the
PDFObjects.
2022-11-08 11:16:21 -08:00