Sakurai/pdf.js - pdf.js - Gitea on kemo

Sakurai/pdf.js

Author	SHA1	Message	Date
Jonas Jenwald	9dfe9c552c	Use shorter arrow functions where possible For arrow functions that are both simple and short, we can avoid using explicit `return` to shorten them even further without hurting readability. For the `gulp mozcentral` build-target this reduces the overall size of the output by just under 1 kilo-byte (which isn't a lot but still can't hurt).	2024-01-21 10:13:12 +01:00
Jonas Jenwald	4d19db0b19	Re-format the code to account for `prettier` and `globals` updates The `prettier` update slightly changed the formatting of some await-expressions; please see https://github.com/prettier/prettier/blob/main/CHANGELOG.md#302 The `globals` update removed the need for some eslint-disable statements; please see https://github.com/sindresorhus/globals/releases/tag/v13.21.0	2023-08-19 09:30:34 +02:00
Jonas Jenwald	3a886e7264	Move the `isNodeJS`-helper into the `src/shared/util.js` file With the changes in the previous patch the `isNodeJS`-helper no longer needs to live in its own file, which helps get rid of a closure in the built files.	2023-07-17 16:42:25 +02:00
Calixte Denizet	117bbf7cd9	[api-minor] Don't normalize the text used in the text layer. Some arabic chars like \ufe94 could be searched in a pdf, hence it must be normalized when creating the search query. So to avoid to duplicate the normalization code, everything is moved in the find controller. The previous code to normalize text was using NFKC but with a hardcoded map, hence it has been replaced by the use of normalize("NFKC") (it helps to reduce the bundle size by 30kb). In playing with this \ufe94 char, I noticed that the bidi algorithm wasn't taking into account some RTL unicode ranges, the generated font wasn't embedding the mapping this char and the unicode ranges in the OS/2 table weren't up-to-date. When normalized some chars can be replaced by several ones and it induced to have some extra chars in the text layer. To avoid any regression, when copying some text from the text layer, a copied string is normalized (NFKC) before being put in the clipboard (it works like this in either Acrobat or Chrome).	2023-04-17 14:31:23 +02:00
Jonas Jenwald	0e19c3a120	[api-minor] Add support, in `PDFFindController`, for mixing phrase/word searches (issue 7442) Please note: This patch only extends the `PDFFindController` implementation itself to support this functionality, however it's purposely not exposed in the default viewer. This replaces the previous `phraseSearch`-parameter, and a `query`-string will now always be interpreted as a phrase-search. To enable searching for individual words, the `query`-parameter must instead consist of an Array of strings. This way it's now also possible to combine phrase/word searches, with a `query`-parameter looking something like `["Lorem ipsum", "foo", "bar"]` which will search for the phrase "Lorem ipsum" and the words "foo" respectively "bar".	2023-04-15 13:32:37 +02:00
Calixte Denizet	d8795f9f8f	Fix search of numbers inside fractions	2023-04-11 20:57:26 +02:00
Calixte Denizet	07b094729e	Fix search in pdf a containing some UTF-32 characters (bug 1820909) Some chars were supposed to have a length equals to 1 but UTF-32 chars can be longuer.	2023-03-09 15:03:01 +01:00
Calixte Denizet	fc7d74385f	Don't replace an eol by a whitespace when the last char is a Katakana-Hiragana diacritic	2023-02-16 11:31:58 +01:00
Calixte Denizet	4e9f26afa3	Ignore position of combining diacritics when getting text (bug 1640217)	2023-02-09 17:13:57 +01:00
Jonas Jenwald	c5d6391898	[api-minor] Let the `cMapPacked` parameter, in `getDocument`, default to `true` The initial CMap support was added in PR 4259 using the "raw" Adobe files, however they were quickly deemed to be unnecessarily large. As a result PR 4470 introduced the more compact "binary" CMap format, with both of those PRs being included in the very same release (version `0.8.1334`) . Please note that we've thus never shipped anything except the "binary" CMap files with the PDF library, and furthermore note that we've not even once updated the CMap files since they were originally added almost nine years ago. Requiring users to remember that `cMapPacked = true` is necessary, in addition to setting the `cMapUrl` parameter, in order for CMap loading to work feels like a less than ideal API. Hence this patch, which suggests that we simply let `cMapPacked` default to `true` now.	2023-01-30 15:35:02 +01:00
Calixte Denizet	dc94b750de	[GV] Avoid to update the finder when the results aren't complete At the beginning of a search we can an update can be triggered with 0 over 0 found matches. In the GeckoView context, we can't update the finder whenever we want but only when it has been required.	2023-01-20 18:13:16 +01:00
Calixte Denizet	661f425934	[GV] Add an option in the find controller to update matches count only when the last page is reached (bug 1803188). In GeckoView, on an event, a callback must be executed with the result of an action, but the callback can be used only one time. So for each FindInPage event, we must trigger only one matches count update.	2023-01-06 10:56:26 +01:00
Calixte Denizet	69c88477a9	Avoid an infinite loop when searching for a single diacritic	2023-01-02 12:27:07 +01:00
Calixte Denizet	ea1995991b	Don't add an extra space after a Katakana or a Hiragana at the eol when searching	2022-11-29 10:46:48 +01:00
Calixte Denizet	2be64d63e1	Normalize fullwidth, halfwidth and circled chars when searching	2022-11-14 19:27:51 +01:00
Calixte Denizet	6c6f6fb2b8	Don't replace cr by a white space when the last char on the line is an ideographic char	2022-09-04 14:21:05 +02:00
Calixte Denizet	c7afce4210	Support Hangul syllables when searching some text (bug 1771477) - it aims to fix https://bugzilla.mozilla.org/show_bug.cgi?id=1771477; - hangul contains some syllables which are decomposed when using NFD, hence the text must be correctly shifted in case it contains some of them.	2022-05-28 16:50:03 +02:00
Calixte Denizet	18e79e3c0b	[text selection] Add the whitespaces present in the pdf in the text chunk - it aims to fix issue #14627; - the basic idea of the recent text refactoring was to only consider the rendered visible whitespaces. But sometimes, the heuristics aren't correct and although some whitespaces are in the text stream they weren't in the text chunks because they were too small. Hence we added some exceptions, for example, we always add a whitespace when it is between two non-whitespace chars but only when in the same Tj. So basically, this patch removes the constraint to have the chars in the same Tj (in using a circular buffer to save the two last chars) but don't add a space when the visible space is really too small (hence `NOT_A_SPACE_FACTOR`).	2022-03-27 14:34:56 +02:00
Calixte Denizet	18f4e560ae	[Search] Some matches were incorrectly shifted because of some '-\n' - it aims to fix #14562; - 'X-\n' were not correctly positioned; - when X is a diacritic (e.g. in "sä-\n", which is decomposed into "sa¨-\n") we must handle both things: - diacritics on the one hand; - "-\n" on the other hand.	2022-02-14 10:12:33 +01:00
Calixte Denizet	1f41028fcb	Support search with or without diacritics (bug 1508345, bug 916883, bug 1651113) - get original index in using a dichotomic seach instead of a linear one; - normalize the text in using NFD; - convert the query string into a RegExp; - replace whitespaces in the query with \s+; - handle hyphens at eol use to break a word; - add some \s* around punctuation signs	2022-02-03 15:42:55 +01:00
Jonas Jenwald	0a19ef6864	Move the `EventBus`, and related functionality, into its own file The size of the `web/ui_utils.js` file has increased over time, as more code has been added to (or moved into) that file. To reduce its size slightly, this patch moves the event-related functionality into a separate file.	2021-12-15 17:18:57 +01:00
calixteman	bbb64369f1	Merge pull request #13424 from calixteman/chunks2 [api-minor] Fix issues in text selection	2021-10-18 06:14:15 -07:00
Calixte Denizet	61d1063276	Fix issues in text selection - PR #13257 fixed a lot of issues but not all and this patch aims to fix almost all remaining issues. - the idea in this new patch is to compare position of new glyph with the last position where a glyph has been drawn; - no space are "drawn": it just moves the cursor but they aren't added in the chunk; - so this way a space followed by a cursor move can be treated as only one space: it helps to merge all spaces into one. - to make difference between real spaces and tracking ones, we used a factor of the space width (from the font) - it was a pretty good idea in general but it fails with some fonts where space was too big: - in Poppler, they're using a factor of the font size: this is an excellent idea (<= 0.1 * fontSize implies tracking space).	2021-10-17 16:27:05 +02:00
Jonas Jenwald	fa8c0ef616	[api-minor] Change `PDFFindController` to use the "find"-event directly (issue 12731) Looking at the code, I do have to agree with the point made in issue 12731 about it being unexpected/unhelpful that the `PDFFindController.executeCommand`-method isn't directly usable with the "find"-event. The reason for it being this way is, as so often, for historical reasons: The `executeCommand`-method was added (just) prior to the introduction of the `EventBus` in the viewer. Obviously we cannot simply change the existing `PDFFindController.executeCommand`-method, since that'd be a breaking change in code which has existed for over five years. Initially I figured that we could simply add a new method in `PDFFindController` that'd accept the state from the "find"-event, however after thinking about this and looking through the use-cases in the default viewer I settled on a slightly different approach: Let the `PDFFindController` just listen for the "find"-event (on the `EventBus`-instance) directly instead, which also removes one level of (unneeded) indirection during searching in the default viewer. For GENERIC builds of the PDF.js library, the old `PDFFindController.executeCommand`-method is still available with a deprecation warning.	2021-10-16 10:36:22 +02:00
Ross Johnson	6dae2677d5	[api-minor] Highlight search results correctly for normalized text (PR 9448) This patch is a rebased and refactored version of PR 9448, such that it applies cleanly given that `PDFFindController` has changed since that PR was opened; obviously keeping the original author information intact. This patch will thus ensure that e.g. fractions, and other things that we normalize before searching, will still be highlighted correctly in the textLayer. Furthermore, this patch also adds basic unit-tests for this functionality. Note: The `[api-minor]` tag is added, since third-party implementations of the `PDFFindController` must now always use the `pageMatchesLength` property to get accurate length information (see the `web/text_layer_builder.js` changes). Co-authored-by: Ross Johnson <ross@mazira.com> Co-authored-by: Jonas Jenwald <jonas.jenwald@gmail.com>	2021-01-12 18:08:08 +01:00
Jonas Jenwald	c42029489e	Run `gulp lint --fix`, to account for changes in Prettier version `2.2.1` Please refer to https://github.com/prettier/prettier/blob/master/CHANGELOG.md#221 for additional details.	2020-11-29 10:01:46 +01:00
Jonas Jenwald	426945b480	Update Prettier to version 2.0 Please note that these changes were done automatically, using `gulp lint --fix`. Given that the major version number was increased, there's a fair number of (primarily whitespace) changes; please see https://prettier.io/blog/2020/03/21/2.0.0.html In order to reduce the size of these changes somewhat, this patch maintains the old "arrowParens" style for now (once mozilla-central updates Prettier we can simply choose the same formatting, assuming it will differ here).	2020-04-14 12:28:14 +02:00
Jonas Jenwald	36881e3770	Ensure that all `import` and `require` statements, in the entire code-base, have a `.js` file extension In order to eventually get rid of SystemJS and start using native `import`s instead, we'll need to provide "complete" file identifiers since otherwise there'll be MIME type errors when attempting to use `import`.	2020-01-04 13:01:43 +01:00
Jonas Jenwald	de36b2aaba	Enable auto-formatting of the entire code-base using Prettier (issue 11444) Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes). Prettier is being used for a couple of reasons: - To be consistent with `mozilla-central`, where Prettier is already in use across the tree. - To ensure a consistent coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters. Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some). Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that comments won't become too long. Please note: This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a separate commit. (On a more personal note, I'll readily admit that some of the changes Prettier makes are extremely ugly. However, in the name of consistency we'll probably have to live with that.)	2019-12-26 12:34:24 +01:00
Tim van der Meij	ed918bad21	Remove left-over console log from the find controller unit tests	2019-01-12 22:27:40 +01:00
Tim van der Meij	b1cef896f4	Write more unit tests for the find controller Fixes #7356.	2019-01-12 22:17:46 +01:00
Jonas Jenwald	2ed3591b22	Make `PDFFindController` less confusing to use, by allowing searching to start when `setDocument` is called This patch is based on something that I noticed while working on PR 10126. The recent re-factoring of `PDFFindController` brought many improvements, among those the fact that access to `BaseViewer` is no longer required. However, with these changes there's one thing which now strikes me as not particularly user-friendly[1]: The fact that in order for searching to actually work, `PDFFindController.setDocument` must be called and a 'pagesinit' event must be dispatched (from somewhere). For all other viewer components, calling the `setDocument` method[2] is enough in order for the component to actually be usable. The `PDFFindController` thus stands out quite a bit, and it also becomes difficult to work with in any sort of custom implementation. For example: Imagine someone trying to use `PDFFindController` separately from the viewer[3], which should now be relatively simple given the re-factoring, and thus having to (somehow) figure out that they'll also need to manually dispatch a 'pagesinit' event for searching to work. Note that the above even affects the unit-tests, where an out-of-place 'pagesinit' event is being used. To attempt to address these problems, I'm thus suggesting that only `setDocument` should be used to indicate that searching may start. For the default viewer and/or the viewer components, `BaseViewer.setDocument` will now call `PDFFindController.setDocument` when the document is ready, thus requiring no outside configuration anymore[4]. For custom implementation, and the unit-tests, it's now as simple as just calling `PDFFindController.setDocument` to allow searching to start. --- [1] I should have caught this during review of PR 10099, but unfortunately it's sometimes not until you actually work with the code in question that things like these become clear. [2] Assuming, obviously, that the viewer component in question actually implements such a method :-) [3] There's even a very recent issue, filed by someone trying to do just that. [4] Short of providing a `PDFFindController` instance when creating a `BaseViewer` instance, of course.	2018-10-04 10:28:50 +02:00
Tim van der Meij	1b402996cf	Implement a basic unit test for the find controller This commit shows that we can now unit test the find controller and that executing regular queries works. Note that this is only a first step and not a complete suite of unit tests for all possible options of the find controller. While writing this unit test, I found two smaller issues that I addressed directly. The first one is that in the previous find controller refactoring I forgot to rename some occurrences of a now private member variable. Fortunately this did not cause any bugs since we did have a public getter and the fetched value may be changed by reference, but it's nevertheless good to fix. The second issue is that some entries in the `test/unit/clitests.json` file were not correct, resulting in these tests not being executed on e.g., Travis CI.	2018-09-30 18:32:34 +02:00