Commit Graph

1186 Commits

Author SHA1 Message Date
Calixte Denizet
3091e70aad Flush the current chunk when the font changed because of a restore op (issue #14755) 2023-05-18 19:37:16 +02:00
Calixte Denizet
385f275ad9 Warn when pdf.js can't load an OS font 2023-05-16 14:58:38 +02:00
Jonas Jenwald
cb1a10e358 Check the css property in the getFontSubstitution unit-tests
Given that the `css` property isn't constant, since it contains document/font ids, we cannot just check it directly. However, we can make use of regular expressions to ensure that the format is generally correct.
2023-05-14 19:11:35 +02:00
calixteman
4101128c09
Merge pull request #16421 from calixteman/font_subst_test
Add tests for the font substitution
2023-05-14 18:23:12 +02:00
Calixte Denizet
89140fcd98 Add tests for the font substitution 2023-05-14 18:07:03 +02:00
Jonas Jenwald
8fbd6755eb Enable the unicorn/no-useless-promise-resolve-reject ESLint plugin rule
Please see https://github.com/sindresorhus/eslint-plugin-unicorn/blob/main/docs/rules/no-useless-promise-resolve-reject.md

Note that this patch also re-sorts the existing `unicorn`-rules in proper alphabetical order.
2023-05-13 11:30:25 +02:00
Jonas Jenwald
8f3940fbf3 Move the sidebar-resizing handling into the PDFSidebar class
Originally the `PDFSidebarResizer` class was slightly larger, since the code used to contain e.g. feature testing for older (and no longer supported) browsers.
Given that there's some amount of overlap, when it comes to what DOM-elements and state that these classes need, it now seems reasonable to simply move the sidebar-resizing into the `PDFSidebar` class.

For the MOZCENTRAL build-target this patch reduces the size of the *built* `web/viewer.js` file by just over `1.1` kilobytes.
2023-05-12 10:00:12 +02:00
Calixte Denizet
cfb908c999 Add a cache to avoid to load several times a local font
On my computer, it takes few tenths of a second to load a local font.
Since a font can be used several times in a document, the cache will
improve performances.
2023-05-10 20:01:21 +02:00
Calixte Denizet
2486536843 Compress the data when saving annotions
CompressionStream API has been added in Firefox 113
(see https://bugzilla.mozilla.org/show_bug.cgi?id=1823619)
hence we can use it to compress the streams with added/modified
annotations.
2023-05-09 14:46:50 +02:00
Jonas Jenwald
317abd6d07 Change the createPromiseCapability helper function into a PromiseCapability class
This is not only slightly more compact, but it also simplifies the handling of the `settled` getter.
2023-04-29 13:43:24 +02:00
Calixte Denizet
19ca41896e Correctly clip the text in the text layer (fixes #16316) 2023-04-18 17:00:42 +02:00
Calixte Denizet
117bbf7cd9 [api-minor] Don't normalize the text used in the text layer.
Some arabic chars like \ufe94 could be searched in a pdf, hence it must be normalized
when creating the search query. So to avoid to duplicate the normalization code,
everything is moved in the find controller.
The previous code to normalize text was using NFKC but with a hardcoded map, hence it
has been replaced by the use of normalize("NFKC") (it helps to reduce the bundle size
by 30kb).
In playing with this \ufe94 char, I noticed that the bidi algorithm wasn't taking into
account some RTL unicode ranges, the generated font wasn't embedding the mapping this
char and the unicode ranges in the OS/2 table weren't up-to-date.

When normalized some chars can be replaced by several ones and it induced to have
some extra chars in the text layer. To avoid any regression, when copying some text
from the text layer, a copied string is normalized (NFKC) before being put in the
clipboard (it works like this in either Acrobat or Chrome).
2023-04-17 14:31:23 +02:00
Jonas Jenwald
0e19c3a120 [api-minor] Add support, in PDFFindController, for mixing phrase/word searches (issue 7442)
*Please note:* This patch only extends the `PDFFindController` implementation itself to support this functionality, however it's *purposely* not exposed in the default viewer.

This replaces the previous `phraseSearch`-parameter, and a `query`-string will now always be interpreted as a phrase-search.
To enable searching for individual words, the `query`-parameter must instead consist of an Array of strings. This way it's now also possible to combine phrase/word searches, with a `query`-parameter looking something like `["Lorem ipsum", "foo", "bar"]` which will search for the phrase "Lorem ipsum" *and* the words "foo" respectively "bar".
2023-04-15 13:32:37 +02:00
Calixte Denizet
d8795f9f8f Fix search of numbers inside fractions 2023-04-11 20:57:26 +02:00
Jonas Jenwald
5063a6f2a9 [api-minor] Remove the disableCombineTextItems option
*Please note:* This parameter has never been used within the PDF.js library/viewer itself, and it was only ever added for backwards compatibility reasons.

This parameter was added in PR 7475, over six years ago, to try and optionally maintain the previous *default* text-extraction behaviour.
However as part of the general text-extraction improvements in PR 13257, almost two years ago, the `disableCombineTextItems` functionality was accidentally "broken" in various ways. Note how the only (very basic) unit-test was updated in a way that doesn't really make sense, since generally speaking you'd expect that using the option should result in *more* (or at least the same number of) text-items. Furthermore there's also the recent issue 16209, where the option causes almost all textContent to be concatenated together.

Hence this patch proposes that we simply remove the `disableCombineTextItems` option since it's essentially unused/untested functionality, as evident from the fact that it took almost two years for someone to notice that it's broken.
2023-03-30 14:23:38 +02:00
Calixte Denizet
a96f10e55d Create a new chunk when the char is too rised compared to the previouse one 2023-03-28 13:56:46 +02:00
Jonas Jenwald
1fc09f0235 Enable the unicorn/prefer-string-replace-all ESLint plugin rule
Note that the `replaceAll` method still requires that a *global* regular expression is used, however by using this method it's immediately obvious when looking at the code that all occurrences will be replaced; please see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/replaceAll#parameters

Please find additional details at https://github.com/sindresorhus/eslint-plugin-unicorn/blob/main/docs/rules/prefer-string-replace-all.md
2023-03-23 12:57:10 +01:00
Jonas Jenwald
5f64621d46 Use String.prototype.replaceAll() where appropriate
This fairly new method allows replacing *multiple* occurrences within a string without having to use regular expressions.

Please refer to:
 - https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/replaceAll
 - https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/replaceAll#browser_compatibility
2023-03-22 15:31:10 +01:00
Jonas Jenwald
137a2d6e30 Add even more non-standard ligatures (PR 15517 follow-up)
Given that we already create multi-byte ToUnicode entries in other cases, see e.g. the `getNormalizedUnicodes` table, this is hopefully fine.
2023-03-22 10:42:52 +01:00
Jonas Jenwald
9321758d91
Merge pull request #16186 from Snuffleupagus/issue-16176
Support multi-byte ToUnicode entries, when using predefined CMaps (issue 16176)
2023-03-21 22:17:18 +01:00
Jonas Jenwald
d4bcfe8c16 Support multi-byte ToUnicode entries, when using predefined CMaps (issue 16176)
Hopefully this makes sense, since we already "create" multi-byte ToUnicode entries in other cases (see e.g. the `getNormalizedUnicodes` table).
2023-03-21 21:35:57 +01:00
calixteman
8bfebf1c24
Merge pull request #16188 from calixteman/bug1823296
Use the position of the previous xref stream if any when saving a pdf (bug 1823296)
2023-03-21 21:21:49 +01:00
Calixte Denizet
2d0f30a67c Use the position of the previous xref stream if any when saving a pdf (bug 1823296) 2023-03-21 19:27:24 +01:00
Jonas Jenwald
c4a725fe98 Fix the transfer parameter, for structuredClone, in the LoopbackPort
The way that we handle the `transfer` parameter is unfortunately wrong, ever since PR 14392 which introduced the code, given that the MDN article originally contained incorrect information; please see https://github.com/mdn/content/pull/23164

By updating the `structuredClone` call such that it works correctly, we can enable more unit-tests in Node.js environments; please refer to https://developer.mozilla.org/en-US/docs/Web/API/structuredClone#parameters
2023-03-19 22:04:01 +01:00
Jonas Jenwald
0e54a3c37a Warn about missing/incorrect --scale-factor CSS-variable in renderTextLayer (issue 16139)
Unfortunately I don't believe that we can simply add a default `--scale-factor` CSS-variable to the `container`-element, since that might not be entirely appropriate/correct in all cases.[1]
However, we can at least print a console-error to hopefully make this situation more apparent to users. (This is purposely not using the `warn` helper-function, since those messages can be disabled.)

---
[1] One example is in our reference-tests, where we don't need to add it to the `container`-element itself.
2023-03-16 11:53:12 +01:00
calixteman
b2a86350fc
Merge pull request #16096 from bungeman/fix_trig_functions
Correct PostScript trigonometric operators
2023-03-11 14:32:23 +01:00
Calixte Denizet
07b094729e Fix search in pdf a containing some UTF-32 characters (bug 1820909)
Some chars were supposed to have a length equals to 1 but UTF-32 chars
can be longuer.
2023-03-09 15:03:01 +01:00
Calixte Denizet
b8dda089e2 Slightly modify the max width of a tracking space 2023-03-07 19:38:49 +01:00
Ben Wagner
158c836e26 Correct PostScript trigonometric operators
PDF 32000-1:2008 7.10.5.1 "Type 4 (PostScript Calculator) Functions"
defers to the PostScript Language Reference for the description of these
functions. The PostScript Language Reference, third edition chapter 8
"Operators" defines the `angle` type as a "number of degrees". Section
8.1 defines "angle `sin` real", "angle `cos` real", and "num den `atan`
angle". The documentation for `atan` further states that it will return
an angle in degrees between 0 and 360.

Handle these operators correctly in `PostScriptEvaluator.execute`.
Convert the inputs to `sin` and `cos` from degrees to radians for use
with `Math.sin` and `Math.cos`. Correctly pop two values from the stack
for `atan`, use `Math.atan2`, and convert from radians to (positive)
degrees.
2023-03-03 17:25:11 -05:00
Calixte Denizet
fd03cd5493 [api-minor] Generate images in the worker instead of the main thread.
We introduced the use of OffscreenCanvas in #14754 and this patch aims
to use them for all kind of images.
It'll slightly improve performances (and maybe slightly decrease memory use).
Since an image can be rendered in using some transfer maps but because of
OffscreenCanvas we don't have the underlying pixels array the transfer maps
stuff is re-implemented in using the SVG filter feComponentTransfer.
2023-03-01 17:40:12 +01:00
Jonas Jenwald
f42a2e8451
[api-minor] Move the canvasFactory option into getDocument
Rather than repeatedly initializing a `canvasFactory`-instance for every page, move it to the document-level instead.

*Please note:* This patch is written using the GitHub UI, since I'm currently without a dev machine, so hopefully it works correctly.
2023-03-01 09:07:16 +01:00
Calixte Denizet
3a21423386 [Acroform] Use the full path to find the node in the XFA datasets where to store the value
I noticed several 'Path not found' errors because of a field called #subform[2].
From the XFA specs, the hash is used for a class of elements in the template tree.
When we're looking for a node in the datasets tree, it doesn't make sense to search
for a class. Hence the path element starting with a hash are just skipped.
2023-02-23 12:09:39 +01:00
Calixte Denizet
fc7d74385f Don't replace an eol by a whitespace when the last char is a Katakana-Hiragana diacritic 2023-02-16 11:31:58 +01:00
Jonas Jenwald
6d4d402a78 Move the arrayBuffersToBytes helper function into the worker-thread
Given that this helper function is only used on the worker-thread, there's no reason to duplicate it in both of the *built* `pdf.js` and `pdf.worker.js` files.
2023-02-11 21:34:37 +01:00
Calixte Denizet
4e9f26afa3 Ignore position of combining diacritics when getting text (bug 1640217) 2023-02-09 17:13:57 +01:00
Jonas Jenwald
90ffbc1d39 Remove most build-time require statements from the viewer (PR 16009 follow-up)
This further extends the web-specific import maps introduced in PR 16009, to allow removing *most* of the build-time `require` statements from the viewer. The few remaining ones are fallbacks used for the COMPONENTS respectively the `legacy` GENERIC builds.
2023-02-07 22:45:19 +01:00
calixteman
ecd86ccffc
Merge pull request #16020 from calixteman/bug1815476
[Annotation] Avoid to encrypt the appearance stream two times (bug 1815476)
2023-02-07 20:57:49 +01:00
Calixte Denizet
ea7b4b4d6c [Annotation] Avoid to encrypt the appearance stream two times (bug 1815476) 2023-02-07 19:26:46 +01:00
Jonas Jenwald
a98e80c4ff [GeckoView] Reduce the size of the *built* viewer
Given that the GV-viewer isn't using most of the UI-related components of the default-viewer, we can avoid including them in the *built* viewer to save space.[1]
The least "invasive" way of implementing this, at least that I could come up with, is to leverage import maps with suitable stubs for the GV-viewer.

The one slightly annoying thing is that we now have larger import maps across multiple html-files, and you'll need to remember to update all of them when making future changes.

---
[1] With this patch, the built `viewer.js` size is 391 kB and `viewer-geckoview.js` is 285 kB.
2023-02-05 14:12:32 +01:00
Tim van der Meij
e848a0e61c
Merge pull request #15981 from Snuffleupagus/cMapPacked-true
[api-minor] Let the `cMapPacked` parameter, in `getDocument`, default to `true`
2023-02-04 15:00:26 +01:00
Jonas Jenwald
851c394e64 Remove the isEmptyObj unit-test helper function
We should be able to let Jasmine simply compare directly against an actually empty Object, rather than using a manually implemented helper function for that.
2023-02-04 12:43:53 +01:00
Jonas Jenwald
c5d6391898 [api-minor] Let the cMapPacked parameter, in getDocument, default to true
The initial CMap support was added in PR 4259 using the "raw" Adobe files, however they were quickly deemed to be unnecessarily large. As a result PR 4470 introduced the more compact "binary" CMap format, with both of those PRs being included in the very same release (version `0.8.1334`) .

Please note that we've thus never shipped anything *except* the "binary" CMap files with the PDF library, and furthermore note that we've not even once updated the CMap files since they were originally added almost nine years ago.

Requiring users to remember that `cMapPacked = true` is necessary, in addition to setting the `cMapUrl` parameter, in order for CMap loading to work feels like a less than ideal API.
Hence this patch, which suggests that we simply let `cMapPacked` default to `true` now.
2023-01-30 15:35:02 +01:00
Calixte Denizet
6f4d037a8e [JS] Correctly format field with numbers (bug 1811694, bug 1811510)
In PR #15757, a value is automatically converted into a number when it's possible
but the case of numbers like "000123" has been overlooked and their format must
be preserved.
When a script is doing something like "foo.value + bar.value" and the values are
numbers then "foo.value" must return a number but the displayed value must be what
the user entered or what a script set, so this patch is just adding a a field
_orginalValue in order to track the value has it has defined.
Some people are used to use a comma as decimal separator, hence it must be considered
when a value is parsed into a number.
This patch is fixing a regression introduced by #15757.
2023-01-26 14:57:02 +01:00
Tim van der Meij
a27d7ba524
Merge pull request #15943 from Snuffleupagus/deprecate-direct-PDFDataRangeTransport
[api-minor] Deprecate calling `getDocument` directly with a `PDFDataRangeTransport`-instance
2023-01-21 13:50:20 +01:00
Calixte Denizet
dc94b750de [GV] Avoid to update the finder when the results aren't complete
At the beginning of a search we can an update can be triggered with 0 over 0
found matches.
In the GeckoView context, we can't update the finder whenever we want but only
when it has been required.
2023-01-20 18:13:16 +01:00
Jonas Jenwald
7976fc7851 [api-minor] Deprecate calling getDocument directly with a PDFDataRangeTransport-instance
In general it's recommended to pass a *parameter object* when calling the `getDocument`-function in the API, since that's the only way to provide additional options, and the fact that it also accepts a URL or TypedArray directly is now mostly for backwards compatibility reasons.
However, the `getDocument`-function also accepts a direct `PDFDataRangeTransport`-instance which just seems unnecessary.

*Please note:* The `PDFDataRangeTransport`-implementation was added specifically for the *built-in* Firefox PDF Viewer, however it's most likely not commonly used by any third-party (given that it requires manual PDF-data loading).
Furthermore, the default-viewer always provides a *parameter object* when calling the `getDocument`-function and it's thus completely unaffected by these changes.
2023-01-19 14:25:55 +01:00
Jonas Jenwald
8f3fa18c93
Merge pull request #15920 from Snuffleupagus/transfer-pdf-data
[api-minor] Enable transferring of TypedArray PDF data by default (PR 15908 follow-up)
2023-01-16 13:20:57 +01:00
Tim van der Meij
9e3adb5ec7
Implement unit tests for the numberToString utility function 2023-01-14 15:09:58 +01:00
Tim van der Meij
a6dfcc89fa
Implement unit tests for the recoverJsURL utility function 2023-01-14 15:09:58 +01:00
Jonas Jenwald
397f943ca3 [api-minor] Enable transferring of TypedArray PDF data by default (PR 15908 follow-up)
This patch removes the recently introduced `transferPdfData` API-option, and simply enables transferring of TypedArray data *by default* instead of copying it. This will help reduce main-thread memory usage, however it will take ownership of the TypedArrays. Currently this only applies to the following cases:
 - TypedArrays passed to the `getDocument`-function in the API, in order to open PDF documents from binary data.
 - TypedArrays passed to a `PDFDataRangeTransport`-instance, used to support custom PDF document fetching/loading (see e.g. the Firefox PDF Viewer).

*PLEASE NOTE:* To avoid being affected by this, please simply *copy* any TypedArray data before passing it to either of the functions/methods mentioned above.

Now that we transfer TypedArray data that we previously only copied, we need to be more careful with input validation. Given how the `{IPDFStreamReader, IPDFStreamRangeReader}.read` methods will always return ArrayBuffer data, which is then transferred to the worker-thread[1], the actual TypedArray data passed to the API thus need to have the same exact size as its underlying ArrayBuffer to prevent issues.
Hence we'll check for this and only allow transferring of *safe* TypedArray data, and fallback to simply copying the data just as before. This obviously shouldn't be an issue in the Firefox PDF Viewer, but for the general PDF.js library we need to be more careful here.

---
[1] See e09ad99973/src/display/api.js (L2492-L2506) respectively e09ad99973/src/display/api.js (L2578-L2590)
2023-01-14 10:39:36 +01:00
Jonas Jenwald
bbe629018d [api-minor] Add a new transferPdfData option to allow transferring more data to the worker-thread (bug 1809164)
Also, removes the `initialData`-parameter JSDocs for the `getDocument`-function given that this parameter has been completely unused since PR 8982 (over five years ago). Note that the `initialData`-parameter is, and always was, intended to be provided when initializing a `PDFDataRangeTransport`-instance.
2023-01-10 21:03:44 +01:00
Jonas Jenwald
1d5de9f4f4 Inline the setPDFNetworkStreamFactory functionality in src/display/api.js
Given that this is internal functionality, not exposed in the official API, it's not entirely clear (at least to me) why we can't just initialize this directly in `src/display/api.js` instead.
When testing both the development viewer and all the ways in which we run tests, everthing still appears to work just fine with this patch.
2023-01-06 13:23:07 +01:00
Calixte Denizet
661f425934 [GV] Add an option in the find controller to update matches count only when the last page is reached (bug 1803188).
In GeckoView, on an event, a callback must be executed with the result of an action,
but the callback can be used only one time.
So for each FindInPage event, we must trigger only one matches count update.
2023-01-06 10:56:26 +01:00
Jonas Jenwald
42aa08563b
Merge pull request #15880 from Snuffleupagus/rm-docStats
[api-minor] Remove the `PDFDocumentProxy.stats` getter (PR 15758 follow-up)
2023-01-02 16:03:53 +01:00
Calixte Denizet
69c88477a9 Avoid an infinite loop when searching for a single diacritic 2023-01-02 12:27:07 +01:00
Jonas Jenwald
0c1fb4e740 [api-minor] Remove the PDFDocumentProxy.stats getter (PR 15758 follow-up)
This was deprecated in PR 15758 and given that it's quite unlikely that any third-party users are relying on this functionality, since it was only ever added to support telemetry reporting in the Firefox PDF Viewer, it should hopefully be fine to remove this fairly quickly.

These changes reduce the bundle size of the Firefox PDF Viewer by 4.5 kB in total.
2023-01-01 17:06:47 +01:00
Jonas Jenwald
91524d1a60 [api-minor] Allow specifying an extra-delay, in RenderTask.cancel, for worker-thread aborting of operatorList parsing
This is done to support upcoming viewer-changes, and in order to prevent third-party users from outright breaking things we'll simply ignore too large values.
2022-12-14 12:34:16 +01:00
Jonas Jenwald
fe8fded23b [api-minor] Combine the textContent/textContentStream parameters
Rather than handling these parameters separately, which is a left-over from back when streaming of textContent was originally added, we can simply pass either data directly to the `TextLayer` and let it handle things accordingly.

Also, improves a few JSDoc comments and `typedef`-imports.
2022-12-04 21:22:14 +01:00
Tim van der Meij
99cfef882f
Merge pull request #15752 from Snuffleupagus/no-typeof-undefined
Enable the `no-typeof-undefined` ESLint plugin rule
2022-12-02 19:48:16 +01:00
Calixte Denizet
eed9bf71c5 Refactor the text layer code in order to avoid to recompute it on each draw
The idea is just to resuse what we got on the first draw.
Now, we only update the scaleX of the different spans and the other values
are dependant of --scale-factor.
Move some properties in the CSS in order to avoid any updates in JS.
2022-12-01 18:42:43 +01:00
Jonas Jenwald
47dbfc4ade Enable the no-typeof-undefined ESLint plugin rule
Please see https://github.com/sindresorhus/eslint-plugin-unicorn/blob/main/docs/rules/no-typeof-undefined.md
2022-12-01 18:20:39 +01:00
Calixte Denizet
ea1995991b Don't add an extra space after a Katakana or a Hiragana at the eol when searching 2022-11-29 10:46:48 +01:00
Calixte Denizet
ae7da6ae48 [JS] By default, a text field value must be treated as a number (bug 1802888) 2022-11-28 16:24:01 +01:00
Calixte Denizet
4ee0c83548 [JS] Fix a rounding issue in printf (bug 1802888) 2022-11-28 14:37:15 +01:00
Jonas Jenwald
0ba242ea4a Support FileAttachments with hash-signs in the filename (issue 15729)
The reason for the issue is that we use the generic `getFilenameFromUrl` helper function, which was originally intended for regular URLs.
For the filenames we're dealing with in FileAttachments, we really only want to strip the path when one exists[1].

---
[1] See [bug 1230933](https://bugzilla.mozilla.org/show_bug.cgi?id=1230933) for an example of such a case.
2022-11-23 10:47:33 +01:00
Jonas Jenwald
7d029f8bfe Add a basic stringToUTF16HexString unit-test 2022-11-16 12:39:35 +01:00
Jonas Jenwald
9adc7859c8 Move the escapeString helper function into the worker-thread
Given that this helper function is only used on the worker-thread, there's no reason to duplicate it in both of the `pdf.js` and `pdf.worker.js` files.
2022-11-16 12:35:48 +01:00
Jonas Jenwald
e5859e145d Move the isAscii helper function into the worker-thread
Given that this helper function is only used on the worker-thread, there's no reason to duplicate it in both of the `pdf.js` and `pdf.worker.js` files.
2022-11-16 12:35:48 +01:00
Jonas Jenwald
2eaa708e3a Combine the stringToUTF16String and stringToUTF16BEString helper functions
Given that these functions are virtually identical, with the latter only adding a BOM, we can combine the two. Furthermore, since both functions were only used on the worker-thread, there's no reason to duplicate this functionality in both of the `pdf.js` and `pdf.worker.js` files.
2022-11-16 12:35:44 +01:00
Calixte Denizet
2be64d63e1 Normalize fullwidth, halfwidth and circled chars when searching 2022-11-14 19:27:51 +01:00
Calixte Denizet
3ca03603c2 [Annotation] Fix printing/saving for annotations containing some non-ascii chars and with no fonts to handle them (bug 1666824)
- For text fields
 * when printing, we generate a fake font which contains some widths computed thanks to
   an OffscreenCanvas and its method measureText.
   In order to avoid to have to layout the glyphs ourselves, we just render all of them
   in one call in the showText method in using the system sans-serif/monospace fonts.
 * when saving, we continue to create the appearance streams if the fonts contain the char
   but when a char is missing, we just set, in the AcroForm dict, the flag /NeedAppearances
   to true and remove the appearance stream. This way, we let the different readers handle
   the rendering of the strings.
- For FreeText annotations
  * when printing, we use the same trick as for text fields.
  * there is no need to save an appearance since Acrobat is able to infer one from the
    Content entry.
2022-11-10 19:05:39 +01:00
Samuel Yuan
36fb5c1e2b Propagate the translated font name to TextContentItems.
This allows font data for system fonts to be looked up in the
PDFObjects.
2022-11-08 11:16:21 -08:00
Jonas Jenwald
23930a249e [api-minor] Let Catalog.getAllPageDicts return an *empty* dictionary when loading the first /Page fails (issue 15590)
In order to support opening certain corrupt PDF documents, particularly hand-edited ones, this patch adds support for letting the `Catalog.getAllPageDicts` method fallback to returning an *empty* dictionary to replace (only) the first /Page of the document.
Given that the viewer cannot initialize/load without access to the first page, this will thus allow e.g. document-level scripting to run as expected. Note that by effectively replacing a corrupt or missing first /Page in this way[1], we'll now render nothing but a *blank* page for certain cases of broken/corrupt PDF documents which may look weird.

*Please note:* This functionality is controlled via the existing `stopAtErrors` option, that can be passed to `getDocument`, since it's easy to imagine use-cases where this sort of fallback behaviour isn't desirable.

---
[1] Currently we still require that a /Pages-dictionary is found though, however it *may* be possible to relax even that assumption if that becomes absolutely necessary in future corrupt documents.
2022-11-03 12:51:48 +01:00
Jonas Jenwald
2516ffa78e Fallback to finding the first "obj" occurrence, when the trailer-dictionary is incomplete (issue 15590)
Note that the "trailer"-case is already a fallback, since normally we're able to use the "xref"-operator even in corrupt documents. However, when a "trailer"-operator is found we still expect "startxref" to exist and be usable in order to advance the stream position. When that's not the case, as happens in the referenced issue, we use a simple fallback to find the first "obj" occurrence instead.

This *partially* fixes issue 15590, since without this patch we fail to find any objects at all during `XRef.indexObjects`. However, note that the PDF document is still corrupt and won't render since there's no actual /Pages-dictionary and the /Root-entry simply points to the /OpenAction-dictionary instead.
2022-11-03 12:46:30 +01:00
Jonas Jenwald
71bd8b4de9 Let Lexer.getNumber treat more invalid "numbers" as zero (issue 15604)
In the referenced PDF document there are "numbers" which consist only of `-.`, and while that's obviously not valid Adobe Reader seems to handle it just fine.
Letting this method ignore more invalid "numbers" was suggested during the review of PR 14543, so let's simply relax our the validation here.
2022-10-20 22:36:15 +02:00
Calixte Denizet
e756bb69e4 [JS] Take into account all the required fields for some computations
- Fix Field::getArray in order to collect only the fields which have a value;
- Fix AFSimple_Calculate:
  * allow to have a string with a list of field names as argument;
  * since a field can be non-terminal, use Field::getArray to collect
    the field under it and then apply the calculation on all the descendants.
2022-10-13 18:33:12 +02:00
Jonas Jenwald
ce66fefbff [api-minor] Add partial support for the "GoToE" action (issue 8844)
*Please note:* The referenced issue is the only mention that I can find, in either GitHub or Bugzilla, of "GoToE" actions.
Hence why I've purposely settled for a very simple, and partial, "GoToE" implementation to avoid complicating things initially.[1] In particular, this patch only supports "GoToE" actions that references the /EmbeddedFiles-dict in the PDF document.

See https://web.archive.org/web/20220309040754if_/https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf#G11.2048909

---
[1] Usually I always prefer having *real-world* test-cases to work with, whenever I'm implementing new features.
2022-10-06 10:33:07 +02:00
Jonas Jenwald
fe5d9b4b6a Remove duplicated destroy-calls in the "custom ownerDocument" unit-tests
Given that `PDFDocumentProxy.destroy` is nothing but an alias for `PDFDocumentLoadingTask.destroy` calling both methods is obviously not useful.
2022-10-02 12:01:41 +02:00
Jonas Jenwald
7b24931f67
Merge pull request #15517 from Snuffleupagus/issue-15516
Add more non-standard ligatures in the `glyphlist.js` file (issue 15516)
2022-09-30 23:30:50 +02:00
Calixte Denizet
330048ad6b [JS] Add the function AFExactMatch 2022-09-29 14:23:56 -10:00
Jonas Jenwald
c87f90102c Add more non-standard ligatures in the glyphlist.js file (issue 15516)
Note that this PR only adds the "underscore"-variant of *actually existing* ligatures, however the referenced PDF document also uses a couple of non-standard ones (e.g. `ft`, `Th`, and `fh`) that we cannot easily support without larger changes (since they don't have official Unicode-entries).
Given that it's clearly the PDF document, and its fonts, that's the culprit here it's not entirely clear to me that we actually want to attempt a larger refactoring/rewriting of the `glyphlist.js` code, assuming it's even generally possible. Especially when this patch alone already improves our copy-paste behaviour when compared to both Adobe Reader and PDFium, and that this is only the *second* time this sort of bug has been reported.
2022-09-27 16:31:51 +02:00
calixteman
da1780f826
Merge pull request #15486 from nmtigor/fix_orders_of_prop
Fix property chain orders of Operators in isDotExpression
2022-09-25 04:13:25 -10:00
Jonas Jenwald
6538409282 Replace some Array.prototype-usage with spread syntax
We have a few, quite old, call-sites that use the `Array.prototype`-format and which can now be replaced with spread syntax instead.
2022-09-23 09:35:30 +02:00
Calixte Denizet
9e40938a29 [JS] Try to guess what the date is when it doesn't follow the given format (issue #15490)
We use the format to guess in which order we can find month, day, ... we get the numbers
in the date and consider them as month, day, ...
2022-09-22 16:30:39 +02:00
nmtigor
22cc9b7dc7 Fix property chain orders of Operators in isDotExpression and isSomPredicate 2022-09-21 17:20:23 +02:00
Calixte Denizet
f5b835157b [XFA] Fix an hidden issue in the FormCalc lexer
Since there are no script engine with XFA, the FormCalc parser is not used irl.
The bug @nmtigor noticed was hidden by another one (the wrong check on `match`).
2022-09-20 13:53:55 +02:00
Jonas Jenwald
21fe5017bb Remove the abstract BaseViewer-class
After the changes in PR 14112 the `PDFViewer`-class is now "identical" to the `BaseViewer`-class and the `PDFSinglePageViewer`-class is just a very thin wrapper around the `BaseViewer`-class.
Hence we can rename these files, and also remove the abstract `BaseViewer`-class, which helps reduce some unnecessary "closures" in the *built* viewer.

*Please note:* These changes are made in two separate commits, to allow GitHub to preserve `blame` for the affected files.
2022-09-08 12:38:17 +02:00
Jonas Jenwald
6dc4c994b8 Remove the abstract BaseViewer-class
After the changes in PR 14112 the `PDFViewer`-class is now "identical" to the `BaseViewer`-class and the `PDFSinglePageViewer`-class is just a very thin wrapper around the `BaseViewer`-class.
Hence we can rename these files, and also remove the abstract `BaseViewer`-class, which helps reduce some unnecessary "closures" in the *built* viewer.

*Please note:* These changes are made in two separate commits, to allow GitHub to preserve `blame` for the affected files.
2022-09-08 12:38:17 +02:00
Calixte Denizet
6c6f6fb2b8 Don't replace cr by a white space when the last char on the line is an ideographic char 2022-09-04 14:21:05 +02:00
Jonas Jenwald
cc4baa2fe9 [api-minor] Add basic support for the SetOCGState action (issue 15372)
Note that this patch implements the `SetOCGState`-handling in `PDFLinkService`, rather than as a new method in `OptionalContentConfig`[1], since this action is nothing but a series of `setVisibility`-calls and that it seems quite uncommon in real-world PDF documents.

The new functionality also required some tweaks in the `PDFLayerViewer`, to ensure that the `layersView` in the sidebar is updated correctly when the optional-content visibility changes from "outside" of `PDFLayerViewer`.

---
[1] We can obviously move this code into `OptionalContentConfig` instead, if deemed necessary, but for an initial implementation I figured that doing it this way might be acceptable.
2022-09-01 17:34:24 +02:00
Jonas Jenwald
216b86a082 [api-minor] Support Named-actions in the outline (issue 15367)
Apparently this is implemented in e.g. Adobe Reader, and the specification does support it, however it cannot be commonly used in real-world PDF documents since it took over ten years for this feature to be requested.
2022-08-30 18:47:45 +02:00
Calixte Denizet
c06c5f7cbd [Annotations] charLimit === 0 means unlimited (bug 1782564)
Changing the charLimit in JS had no impact, so this patch aims to fix
that and add an integration test for it.
2022-08-19 11:28:28 +02:00
Jonas Jenwald
0024165f1f Move binarySearchFirstItem back to the web/-folder (PR 15237 follow-up)
This was moved into the `src/display/`-folder in PR 15110, for the initial editor-a11y patch. However, with the changes in PR 15237 we're again only using `binarySearchFirstItem` in the `web/`-folder and it thus seem reasonable to move it back there.
The primary reason for moving it back is that `binarySearchFirstItem` is currently exposed in the public API, and we always want to avoid that unless it's either PDF-related functionality or code that simply must be shared between the `src/`- and `web/`-folders. In this case, `binarySearchFirstItem` is a general helper function that doesn't really satisfy either of those alternatives.
2022-08-14 11:38:17 +02:00
Jonas Jenwald
dd95e4f851 Add *official* support for passing ArrayBuffer-data to getDocument (issue 15269)
While this has always worked, as a consequence of the implementation, it's never been officially supported.
In addition to adding basic unit-tests, this patch also introduces a couple of new JSDoc `@typedef`s in the API to avoid overly long lines.
2022-08-10 14:13:01 +02:00
Jonas Jenwald
f6db7975c5 Enable the ESLint prefer-spread rule
Note that in a couple of spots the argument could be `undefined` and there we simply disable the rule instead.

Please refer to https://eslint.org/docs/latest/rules/prefer-spread
2022-08-06 10:17:00 +02:00
calixteman
b985eaa98c
Merge pull request #15267 from calixteman/freetext_a11y
[Annotation] Add a div containing the text of a FreeText annotation (bug 1780375)
2022-08-04 11:49:29 +02:00
Calixte Denizet
31155740c3 [Annotation] Add a div containing the text of a FreeText annotation (bug 1780375)
An annotation doesn't have to be in the text flow, hence it's likely a bad idea
to insert its text in the text layer. But the text must be visible from a screen
reader point of view so it must somewhere in the DOM.
So with this patch, the text from a FreeText annotation is extracted and added in
a div in its HTML counterpart, and with the patch #15237 the text should be visible
and positioned relatively to the text flow.
2022-08-04 11:14:05 +02:00
Calixte Denizet
6916fabd51 Skip unknown fields when calculating a value in using AFSimple_Calculate 2022-08-03 23:40:09 +02:00
Jonas Jenwald
0c31320c12 [api-minor] Improve thumbnail handling in documents that contain interactive forms
To improve performance of the sidebar we use the page-canvases to generate the thumbnails whenever possible, since that avoids unnecessary re-rendering when the sidebar is open. This works generally well, however there's an old problem in PDF documents that contain interactive forms (when those are enabled): Note how the thumbnails become partially (or fully) blank, since those Annotations are not included in the OperatorList.[1]

We obviously want to keep using the `PDFThumbnailView.setImage`-method for most documents, however we need a way to skip it only for those pages that contain interactive forms.
As it turns out it's unfortunately not all that simple to tell, after the fact, from looking only at the OperatorList that some Annotations were skipped. While it might have been possible to try and infer that in the viewer, it'd not have been pretty considering that at the time when rendering finishes the annotationLayer has not yet been built.
The overall simplest solution that I could come up with, was instead to include a *summary* of the interactive form-state when doing the final "flushing" of the OperatorList and expose that information in the API.

---
[1] Some examples from our test-suite: `annotation-tx2.pdf` where the thumbnail is completely blank, and `bug1737260.pdf` where the thumbnail is missing the "buttons" found on the page.
2022-07-30 16:53:32 +02:00
Jonas Jenwald
2fb083f3e2 Ensure that the isUsingOwnCanvas-parameter is consistently included in operatorLists (PR 14247 follow-up)
Currently some `OPS.beginAnnotation` arguments will contain a `Number` value for the `isUsingOwnCanvas`-parameter, or in some cases an `undefined` value, which is inconsistent from an API perspective.
2022-07-28 13:37:37 +02:00
Calixte Denizet
7831a100b3 [Editor] Add the possibility to change line opacity in Ink editor 2022-07-27 18:46:25 +02:00
Calixte Denizet
af41a5cb49 [Editor] Simplify the command manager
The previous version was maybe functional but definitely painful to maintain
(maybe more efficient... I don't know) so this patch aims to simplify it and
it adds some basic unit tests.
2022-07-21 18:44:41 +02:00
Jonas Jenwald
f46895d750
Merge pull request #15110 from calixteman/editing_a11y
[Editor] Improve a11y for newly added element (#15109)
2022-07-19 20:02:53 +02:00
Calixte Denizet
624b26e1de [Editor] Improve a11y for newly added element (#15109)
- In the annotationEditorLayer, reorder the editors in the DOM according
  the position of the elements on the screen;
- add an aria-owns attribute on the "nearest" element in the text layer
  which points to the added editor.
2022-07-19 18:52:17 +02:00
Jonas Jenwald
37ebc28756 Use more for...of loops in the code-base
Note that these cases, which are all in older code, were found using the [`unicorn/no-for-loop`](https://github.com/sindresorhus/eslint-plugin-unicorn/blob/main/docs/rules/no-for-loop.md) ESLint plugin rule.
However, note that I've opted not to enable this rule by default since there's still *some* cases where I do think that it makes sense to allow "regular" for-loops.
2022-07-17 16:18:54 +02:00
Jonas Jenwald
c2f7942aea Ensure that the /Resources-entry is actually a dictionary (issue 15150)
Prevent issues in *corrupt* PDF documents, if the /Resources-entry is not of the correct and expected type.
2022-07-08 12:43:43 +02:00
Jonas Jenwald
345bb18575 [editor] Use the fit-curve package (issue 15004)
Rather than including all of this external code in the PDF.js repository, we should be using the npm package instead.
Unfortunately this is slightly more complicated than you'd hope, since the `fit-curve` package (which is older) isn't directly compatible with modern JavaScript modules.
In particular, the following cases needed to be considered:
 - For the development viewer (i.e. `gulp server`) and the unit-tests, we thus need to build a fitCurve-bundle that can be directly `import`ed.
 - For the actual PDF.js build-targets, we can slightly reduce the sizes by depending on the "raw" `fit-curve` source-code.
 - For the Node.js unit-tests, the `fit-curve` package can be used as-is.
2022-07-07 10:43:43 +02:00
Calixte Denizet
1a3ef2a0aa [editor] Add some UI elements in order to set font size & color, and ink thickness & color 2022-06-28 12:05:04 +02:00
Calixte Denizet
3789dab307 Always flush the current item with MarkedContent stuff when getting text (#15094) 2022-06-25 17:19:57 +02:00
calixteman
23fcdabb37
Merge pull request #15088 from calixteman/editor_rotation
Support rotating editor layer
2022-06-25 16:18:07 +02:00
Calixte Denizet
0c420f5135 Support rotating editor layer
- As in the annotation layer, use percent instead of pixels as unit;
- handle the rotation of the editor layer in allowing editing when rotation
  angle is not zero;
- the different editors are rotated counterclockwise in order to be usable
  when the main page is itself rotated;
- add support for saving/printing rotated editors.
2022-06-24 20:02:32 +02:00
Calixte Denizet
6e46226cd7 Fix unit test (#15093 follow-up) 2022-06-24 18:55:35 +02:00
Jonas Jenwald
1cc7cecc7b [api-minor] Introduce a PrintAnnotationStorage with *frozen* serializable data
Given that printing is triggered *synchronously* in browsers, it's thus possible for scripting (in PDF documents) to modify the Annotation-data while printing is currently ongoing.
To work-around that we add a new printing-specific `AnnotationStorage`, where the serializable data is *frozen* upon initialization, which the viewer can thus create/utilize during printing.
2022-06-23 17:06:46 +02:00
Calixte Denizet
30c63eb0ec [Editor] Add support for printing newly added FreeText annotations 2022-06-22 13:26:09 +02:00
Calixte Denizet
f27c8c4471 [Editor] Add support for printing newly added Ink annotations 2022-06-21 18:21:49 +02:00
Calixte Denizet
cdc58b7a52 Rotate annotations based on the MK::R value (bug 1675139)
- it aims to fix: https://bugzilla.mozilla.org/show_bug.cgi?id=1675139;
- An annotation can be rotated (counterclockwise);
- the rotation can be set in using JS.
2022-06-21 17:57:26 +02:00
Jonas Jenwald
8129815538 Enable the unicorn/prefer-dom-node-append ESLint plugin rule
This rule will help enforce slightly shorter code, especially since you can insert multiple elements at once, and according to MDN `Element.append()` is available in all browsers that we currently support.

Please find additional information here:
 - https://developer.mozilla.org/en-US/docs/Web/API/Element/append
 - https://github.com/sindresorhus/eslint-plugin-unicorn/blob/main/docs/rules/prefer-dom-node-append.md
2022-06-12 13:07:03 +02:00
Jonas Jenwald
bbf857d635 [api-minor] Stop using the beginAnnotations/endAnnotations operators (PR 14998 follow-up)
After the changes in PR 14998, these operators are now no-ops in the `src/display/canvas.js` code and should no longer be necessary.
Given that `beginAnnotations`/`endAnnotations` are not in the PDF specification, but are rather *custom* PDF.js operators, it seems reasonable to stop using them now that they've become no-ops.
2022-06-11 14:21:26 +02:00
Tim van der Meij
a57a4bc6c2
Merge pull request #15018 from Snuffleupagus/issue-15016
Expose `TextLayerRenderTask` in the TypeScript definitions (issue 15016, PR 14013 follow-up)
2022-06-10 22:18:35 +02:00
Jonas Jenwald
e046b811b7 Expose TextLayerRenderTask in the TypeScript definitions (issue 15016, PR 14013 follow-up)
While `TextLayerRenderTask` apparently makes sense in TypeScript environments, given that it's being returned by the `renderTextLayer`-function in the API, we really don't want to extend the *public* API by simply exporting the class directly in `src/pdf.js` since it should never be called/initialized manually.
Hence we follow the same pattern as in PR 14013, and add some very basic unit-tests to ensure that `renderTextLayer` always returns a `TextLayerRenderTask`-instance as expected.
2022-06-10 22:12:32 +02:00
Jonas Jenwald
9ac4536693 Enable the unicorn/prefer-at ESLint plugin rule (PR 15008 follow-up)
Please find additional information here:
 - https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/at
 - https://github.com/sindresorhus/eslint-plugin-unicorn/blob/main/docs/rules/prefer-at.md
2022-06-09 21:21:19 +02:00
Calixte Denizet
36aae436bf [editor] Add support for saving newly added Ink 2022-06-08 22:16:01 +02:00
Calixte Denizet
7773b3f5be [edition] Add support for saving a newly added FreeText 2022-06-08 14:34:09 +02:00
Calixte Denizet
c7afce4210 Support Hangul syllables when searching some text (bug 1771477)
- it aims to fix https://bugzilla.mozilla.org/show_bug.cgi?id=1771477;
- hangul contains some syllables which are decomposed when using NFD, hence
  the text must be correctly shifted in case it contains some of them.
2022-05-28 16:50:03 +02:00
Calixte Denizet
60498c67e4 Display background when printing or saving a text widget (issue #14928) 2022-05-19 16:41:54 +02:00
Jonas Jenwald
6bcc5b615d [api-minor] Include line endings in Line/Polyline Annotation-data (issue 14896)
Please refer to:
 - https://web.archive.org/web/20220309040754if_/https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf#G11.2109792
 - https://web.archive.org/web/20220309040754if_/https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf#G11.2096489
 - https://web.archive.org/web/20220309040754if_/https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf#G11.2096447

Note that we still won't attempt to use the /LE-data when creating fallback appearance streams, as mentioned in PR 13448, since custom line endings aren't common enough to warrant the added complexity.
Finally, note that according to the PDF specification we should *potentially* also take the line endings into account for FreeText Annotations. However, in that case their use is conditional on other parameters that we currently don't support.
2022-05-12 11:08:30 +02:00
Jonas Jenwald
8267fd8a52 Replace the AnnotationStorage.lastModified-getter with a proper hash-method
The current `lastModified`-getter, which only contains a time-stamp, is a fairly crude way of detecting if the stored data has actually been changed. In particular, when the `getRawValue`-method is used, the `lastModified`-getter doesn't cope with data being modified from the "outside".

To fix these issues[1], and to prevent any future bugs in this code, this patch introduces a new `AnnotationStorage.hash`-getter which computes a hash of the currently stored data. To simplify things this re-uses the existing `MurmurHash3_64`-implementation, which required moving that file into the `src/shared/`-folder, since its performance should be good enough here.

---
[1] Given how the `AnnotationStorage.lastModified`-getter was used, this would have been limited to *printing* of forms.
2022-05-04 15:21:30 +02:00
Jonas Jenwald
8135d7ccf6
Merge pull request #14869 from calixteman/14862
[JS] Fix few bugs present in the pdf for issue #14862
2022-05-03 18:31:31 +02:00
Calixte Denizet
094ff38da0 [JS] Fix few bugs present in the pdf for issue #14862
- since resetForm function reset a field value a calculateNow is consequently triggered.
  But the calculate callback can itself call resetForm, hence an infinite recursive loop.
  So basically, prevent calculeNow to be triggered by itself.
- in Firefox, the letters entered in some fields were duplicated: "AaBb" instead of "AB".
  It was mainly because beforeInput was triggering a Keystroke which was itself triggering
  an input value update and then the input event was triggered.
  So in order to avoid that, beforeInput calls preventDefault and then it's up to the JS to
  handle the event.
- fields have a property valueAsString which returns the value as a string. In the
  implementation it was wrongly used to store the formatted value of a field (2€ when the user
  entered 2). So this patch implements correctly valueAsString.
- non-rendered fields can be updated in using JS but when they're, they must take some properties
  in the annotationStorage. It was implemented for field values, but it wasn't for
  display, colors, ...
- it fixes #14862 and #14705.
2022-05-03 15:48:44 +02:00
Jonas Jenwald
df5a4fd0a7 Support encoded dest-strings in /GoTo destination dictionaries (issue 14864)
Interestingly enough this appears to be the very first case of *encoded* dest-strings, in /GoTo destination dictionaries, that we've actually come across. What's really fascinating is that it's less than a week after issue 14847, given that these issues are *somewhat* similar.
2022-05-02 10:14:32 +02:00
Jonas Jenwald
fbf6dee8ee [api-minor] Remove the forceClamped-functionality in the Streams (issue 14849)
As it turns out, most of the code-paths in the `PDFImage`-class won't actually pass the TypedArray (containing the image-data) to the `ColorSpace`-code. Hence we *generally* don't need to force the image-data to be a `Uint8ClampedArray`, and can just as well directly use a `Uint8Array` instead.

In the following cases we're returning the data without any `ColorSpace`-parsing, and the exact TypedArray used shouldn't matter:
 - b72a448327/src/core/image.js (L714)
 - b72a448327/src/core/image.js (L751)

In the following cases the image-data is only used *internally*, and again the exact TypedArray used shouldn't matter:
 - b72a448327/src/core/image.js (L762) with the actual image-data being defined (as `Uint8ClampedArray`) further below
 - b72a448327/src/core/image.js (L837)

*Please note:* This is tagged `api-minor` because it's API-observable, given that *some* image/mask-data will now be returned as `Uint8Array` rather than using `Uint8ClampedArray` unconditionally. However, that seems like a small price to pay to (slightly) reduce memory usage during image-conversion.
2022-04-29 14:46:30 +02:00
Jonas Jenwald
71370d012b Support destinations in NameTrees with encoded keys (issue 14847)
Initially I considered updating the `NameOrNumberTree`-implementation to handle encoded keys, however that quickly became somewhat messy (especially in the `NameOrNumberTree.get`-method) since only NameTrees using string-keys.
Hence the easiest solution, as far as I'm concerned, was thus to just update the `Catalog.destinations`-getter instead. Please note that in the referenced PDF document the `Catalog.destination`-method will thus fallback to fetch all destinations, which should be fine since this is the very first case of encoded keys that we've seen.

Also changes the `NameOrNumberTree.getAll`-method to prevent a possible run-time error, although we've so far not seen such a case, for any non-Array Kids-entries found in a NameTree/NumberTree.

Finally, to improve overall consistency and to hopefully prevent future bugs, the patch also updates a couple of other `NameTree` call-sites to correctly handle encoded keys. (Note that the `Catalog.attachments`-getter was already doing this.)
2022-04-27 11:19:55 +02:00
Calixte Denizet
040fcae5ab Improve performance with image masks (bug 857031)
- it aims to partially fix performance issue reported: https://bugzilla.mozilla.org/show_bug.cgi?id=857031;
- the idea is too avoid to use byte arrays but use ImageBitmap which are a way faster to draw:
  * an ImageBitmap is Transferable which means that it can be built in the worker instead of in the main thread:
    - this is achieved in using an OffscreenCanvas when it's available, there is a bug to enable them
      for pdf.js: https://bugzilla.mozilla.org/show_bug.cgi?id=1763330;
    - or in using createImageBitmap: in Firefox a task is sent to the main thread to build the bitmap so
      it's slightly slower than using an OffscreenCanvas.
  * it's transfered from the worker to the main thread by "reference";
  * the byte buffers used to create the image data have a very short lifetime and ergo the memory used is globally
    less than before.
- Use the localImageCache for the mask;
- Fix the pdf issue4436r.pdf: it was expected to have a binary stream for the image;
- Move the singlePixel trick from operator_list to image: this way we can use this trick even if it isn't in a set
  as defined in operator_list.
2022-04-09 18:26:26 +02:00
Jonas Jenwald
addb4cb12b Use String.prototype.repeat() in a couple of spots
Rather than using a temporary Array to manually create repeated strings, we can use `String.prototype.repeat()` instead.
The reason that we didn't use this from the start is most likely because some browsers, notably IE, didn't support this; note https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/repeat#browser_compatibility
2022-03-30 15:42:40 +02:00
Calixte Denizet
ad3fb71a02 [Annotations] Add support for printing/saving choice list with multiple selections
- it aims to fix issue #12189.
2022-03-29 18:59:44 +02:00
Calixte Denizet
18e79e3c0b [text selection] Add the whitespaces present in the pdf in the text chunk
- it aims to fix issue #14627;
- the basic idea of the recent text refactoring was to only consider the rendered visible whitespaces.
  But sometimes, the heuristics aren't correct and although some whitespaces are in the text stream
  they weren't in the text chunks because they were too small. Hence we added some exceptions, for example,
  we always add a whitespace when it is between two non-whitespace chars but only when in the same Tj.
  So basically, this patch removes the constraint to have the chars in the same Tj
  (in using a circular buffer to save the two last chars) but don't add a space when the visible space is really
  too small (hence `NOT_A_SPACE_FACTOR`).
2022-03-27 14:34:56 +02:00
Jonas Jenwald
849de5a508 Slightly improve validation of (some) parameters in getDocument
There's a couple of `getDocument` parameters that should be numbers, but which are currently not *fully* validated to prevent issues elsewhere in the code-base.
Also, improves validation of the `ownerDocument` parameter since we currently accept more-or-less anything here.
2022-03-21 13:32:17 +01:00
Calixte Denizet
f0b549c2a2 [JS] - Parse a date in using the given format first and then try the default date parser
- it aims to fix #14672.
2022-03-19 16:07:43 +01:00
Jonas Jenwald
c0736647f9 Add general iteration support in the RefSet and RefSetCache classes
This patch removes the existing `forEach` methods, in favor of making the classes properly iterable instead. Given that the classes are using a `Set` respectively a `Map` internally, implementing this is very easy/efficient and allows us to simplify some existing code.
2022-03-18 14:27:34 +01:00
Jonas Jenwald
fb345ee184 Enable the "gets fieldObjects" unit-test in Node.js (PR 14409 follow-up)
Apparently this unit-test works in Node.js now, hence it's *possible* that the reason it didn't work previously is that there were bugs in our old `structuredClone` polyfill.
2022-03-13 10:40:57 +01:00
Tim van der Meij
bcf453cf14
Merge pull request #14656 from Snuffleupagus/mv-isSameOrigin
Move the `isSameOrigin` helper function
2022-03-11 21:08:49 +01:00
Jonas Jenwald
537ed37835 Move the isSameOrigin helper function
This function is currently placed in the `src/shared/util.js` file, which means that the code is duplicated in both of the *built* `pdf.js` and `pdf.worker.js` files. Furthermore, it only has a single call-site which is also specific to the `GENERIC`-build of the PDF.js library.

Hence this helper function is instead moved into the `src/display/api.js` file, in such a way that it's conditionally defined but still can be unit-tested.
2022-03-10 13:51:09 +01:00
Jonas Jenwald
e08e3f4d37 Replace XMLHttpRequest usage with the Fetch API in send (in test/unit/testreporter.js)
Besides converting the `send` function to use the Fetch API, this patch also changes the method to return a `Promise` to get rid of the callback function. (Although, currently there's no call-site passing in a callback function.)
2022-03-10 12:55:08 +01:00
Jonas Jenwald
99cd24ce3e Remove the isString helper function
The call-sites are replaced by direct `typeof`-checks instead, which removes unnecessary function calls. Note that in the `src/`-folder we already had more `typeof`-cases than `isString`-calls.
2022-02-26 16:33:41 +01:00
Tim van der Meij
cf7ce0aa7e
Merge pull request #14600 from Snuffleupagus/getPageIndex-more-validation
[api-minor] Add validation for the  `PDFDocumentProxy.getPageIndex` method
2022-02-26 15:30:00 +01:00
Jonas Jenwald
172d007598 [api-minor] Add validation for the PDFDocumentProxy.getPageIndex method
Currently we'll happily attempt to send any argument passed to this method over to the worker-thread, without doing any sort of validation.
That could obviously be quite bad, since there's first of all no protection against sending unclonable data. Secondly, it's also possible to pass data that will cause the `Ref.get` call in the worker-thread to fail immediately.

In order to address all of these issues, we'll now properly validate the argument passed to `PDFDocumentProxy.getPageIndex` and when necessary reject already on the main-thread instead.
2022-02-24 12:01:51 +01:00
Jonas Jenwald
2be8036eb7 [api-minor] Reduce duplication in the "gets non-existent page" unit-test 2022-02-24 11:25:21 +01:00
Jonas Jenwald
ec87995050 Ensure that Cmd/Name is only initialized with string arguments
Trying to use a non-string argument in either a `Cmd` or a `Name` is not intended, and would basically be an implementation error. Hence we can add a non-PRODUCTION check to enforce this, similar to the existing one used e.g. in the `Dict.set` method.
2022-02-23 22:39:12 +01:00
Tim van der Meij
2bb96a708c
Merge pull request #14598 from Snuffleupagus/rm-isBool
Re-factor the `Catalog.viewerPreferences` method and remove the `isBool` helper function
2022-02-23 20:36:56 +01:00
Jonas Jenwald
3704283f5b Remove the isBool helper function
The call-sites are replaced by direct `typeof`-checks instead, which removes unnecessary function calls.
2022-02-23 13:31:03 +01:00