pdf.js

Author	SHA1	Message	Date
Calixte Denizet	3091e70aad	Flush the current chunk when the font changed because of a restore op (issue #14755 )	2023-05-18 19:37:16 +02:00
Calixte Denizet	385f275ad9	Warn when pdf.js can't load an OS font	2023-05-16 14:58:38 +02:00
Jonas Jenwald	cb1a10e358	Check the `css` property in the `getFontSubstitution` unit-tests Given that the `css` property isn't constant, since it contains document/font ids, we cannot just check it directly. However, we can make use of regular expressions to ensure that the format is generally correct.	2023-05-14 19:11:35 +02:00
calixteman	4101128c09	Merge pull request #16421 from calixteman/font_subst_test Add tests for the font substitution	2023-05-14 18:23:12 +02:00
Calixte Denizet	89140fcd98	Add tests for the font substitution	2023-05-14 18:07:03 +02:00
Jonas Jenwald	8fbd6755eb	Enable the `unicorn/no-useless-promise-resolve-reject` ESLint plugin rule Please see https://github.com/sindresorhus/eslint-plugin-unicorn/blob/main/docs/rules/no-useless-promise-resolve-reject.md Note that this patch also re-sorts the existing `unicorn`-rules in proper alphabetical order.	2023-05-13 11:30:25 +02:00
Jonas Jenwald	8f3940fbf3	Move the sidebar-resizing handling into the `PDFSidebar` class Originally the `PDFSidebarResizer` class was slightly larger, since the code used to contain e.g. feature testing for older (and no longer supported) browsers. Given that there's some amount of overlap, when it comes to what DOM-elements and state that these classes need, it now seems reasonable to simply move the sidebar-resizing into the `PDFSidebar` class. For the MOZCENTRAL build-target this patch reduces the size of the built `web/viewer.js` file by just over `1.1` kilobytes.	2023-05-12 10:00:12 +02:00
Calixte Denizet	cfb908c999	Add a cache to avoid to load several times a local font On my computer, it takes few tenths of a second to load a local font. Since a font can be used several times in a document, the cache will improve performances.	2023-05-10 20:01:21 +02:00
Calixte Denizet	2486536843	Compress the data when saving annotions CompressionStream API has been added in Firefox 113 (see https://bugzilla.mozilla.org/show_bug.cgi?id=1823619) hence we can use it to compress the streams with added/modified annotations.	2023-05-09 14:46:50 +02:00
Jonas Jenwald	317abd6d07	Change the `createPromiseCapability` helper function into a `PromiseCapability` class This is not only slightly more compact, but it also simplifies the handling of the `settled` getter.	2023-04-29 13:43:24 +02:00
Calixte Denizet	19ca41896e	Correctly clip the text in the text layer (fixes #16316 )	2023-04-18 17:00:42 +02:00
Calixte Denizet	117bbf7cd9	[api-minor] Don't normalize the text used in the text layer. Some arabic chars like \ufe94 could be searched in a pdf, hence it must be normalized when creating the search query. So to avoid to duplicate the normalization code, everything is moved in the find controller. The previous code to normalize text was using NFKC but with a hardcoded map, hence it has been replaced by the use of normalize("NFKC") (it helps to reduce the bundle size by 30kb). In playing with this \ufe94 char, I noticed that the bidi algorithm wasn't taking into account some RTL unicode ranges, the generated font wasn't embedding the mapping this char and the unicode ranges in the OS/2 table weren't up-to-date. When normalized some chars can be replaced by several ones and it induced to have some extra chars in the text layer. To avoid any regression, when copying some text from the text layer, a copied string is normalized (NFKC) before being put in the clipboard (it works like this in either Acrobat or Chrome).	2023-04-17 14:31:23 +02:00
Jonas Jenwald	0e19c3a120	[api-minor] Add support, in `PDFFindController`, for mixing phrase/word searches (issue 7442) Please note: This patch only extends the `PDFFindController` implementation itself to support this functionality, however it's purposely not exposed in the default viewer. This replaces the previous `phraseSearch`-parameter, and a `query`-string will now always be interpreted as a phrase-search. To enable searching for individual words, the `query`-parameter must instead consist of an Array of strings. This way it's now also possible to combine phrase/word searches, with a `query`-parameter looking something like `["Lorem ipsum", "foo", "bar"]` which will search for the phrase "Lorem ipsum" and the words "foo" respectively "bar".	2023-04-15 13:32:37 +02:00
Calixte Denizet	d8795f9f8f	Fix search of numbers inside fractions	2023-04-11 20:57:26 +02:00
Jonas Jenwald	5063a6f2a9	[api-minor] Remove the `disableCombineTextItems` option Please note: This parameter has never been used within the PDF.js library/viewer itself, and it was only ever added for backwards compatibility reasons. This parameter was added in PR 7475, over six years ago, to try and optionally maintain the previous default text-extraction behaviour. However as part of the general text-extraction improvements in PR 13257, almost two years ago, the `disableCombineTextItems` functionality was accidentally "broken" in various ways. Note how the only (very basic) unit-test was updated in a way that doesn't really make sense, since generally speaking you'd expect that using the option should result in more (or at least the same number of) text-items. Furthermore there's also the recent issue 16209, where the option causes almost all textContent to be concatenated together. Hence this patch proposes that we simply remove the `disableCombineTextItems` option since it's essentially unused/untested functionality, as evident from the fact that it took almost two years for someone to notice that it's broken.	2023-03-30 14:23:38 +02:00
Calixte Denizet	a96f10e55d	Create a new chunk when the char is too rised compared to the previouse one	2023-03-28 13:56:46 +02:00
Jonas Jenwald	1fc09f0235	Enable the `unicorn/prefer-string-replace-all` ESLint plugin rule Note that the `replaceAll` method still requires that a global regular expression is used, however by using this method it's immediately obvious when looking at the code that all occurrences will be replaced; please see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/replaceAll#parameters Please find additional details at https://github.com/sindresorhus/eslint-plugin-unicorn/blob/main/docs/rules/prefer-string-replace-all.md	2023-03-23 12:57:10 +01:00
Jonas Jenwald	5f64621d46	Use `String.prototype.replaceAll()` where appropriate This fairly new method allows replacing multiple occurrences within a string without having to use regular expressions. Please refer to: - https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/replaceAll - https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/replaceAll#browser_compatibility	2023-03-22 15:31:10 +01:00
Jonas Jenwald	137a2d6e30	Add even more non-standard ligatures (PR 15517 follow-up) Given that we already create multi-byte ToUnicode entries in other cases, see e.g. the `getNormalizedUnicodes` table, this is hopefully fine.	2023-03-22 10:42:52 +01:00
Jonas Jenwald	9321758d91	Merge pull request #16186 from Snuffleupagus/issue-16176 Support multi-byte ToUnicode entries, when using predefined CMaps (issue 16176)	2023-03-21 22:17:18 +01:00
Jonas Jenwald	d4bcfe8c16	Support multi-byte ToUnicode entries, when using predefined CMaps (issue 16176) Hopefully this makes sense, since we already "create" multi-byte ToUnicode entries in other cases (see e.g. the `getNormalizedUnicodes` table).	2023-03-21 21:35:57 +01:00
calixteman	8bfebf1c24	Merge pull request #16188 from calixteman/bug1823296 Use the position of the previous xref stream if any when saving a pdf (bug 1823296)	2023-03-21 21:21:49 +01:00
Calixte Denizet	2d0f30a67c	Use the position of the previous xref stream if any when saving a pdf (bug 1823296)	2023-03-21 19:27:24 +01:00
Jonas Jenwald	c4a725fe98	Fix the `transfer` parameter, for `structuredClone`, in the `LoopbackPort` The way that we handle the `transfer` parameter is unfortunately wrong, ever since PR 14392 which introduced the code, given that the MDN article originally contained incorrect information; please see https://github.com/mdn/content/pull/23164 By updating the `structuredClone` call such that it works correctly, we can enable more unit-tests in Node.js environments; please refer to https://developer.mozilla.org/en-US/docs/Web/API/structuredClone#parameters	2023-03-19 22:04:01 +01:00
Jonas Jenwald	0e54a3c37a	Warn about missing/incorrect `--scale-factor` CSS-variable in `renderTextLayer` (issue 16139) Unfortunately I don't believe that we can simply add a default `--scale-factor` CSS-variable to the `container`-element, since that might not be entirely appropriate/correct in all cases.[1] However, we can at least print a console-error to hopefully make this situation more apparent to users. (This is purposely not using the `warn` helper-function, since those messages can be disabled.) --- [1] One example is in our reference-tests, where we don't need to add it to the `container`-element itself.	2023-03-16 11:53:12 +01:00
calixteman	b2a86350fc	Merge pull request #16096 from bungeman/fix_trig_functions Correct PostScript trigonometric operators	2023-03-11 14:32:23 +01:00
Calixte Denizet	07b094729e	Fix search in pdf a containing some UTF-32 characters (bug 1820909) Some chars were supposed to have a length equals to 1 but UTF-32 chars can be longuer.	2023-03-09 15:03:01 +01:00
Calixte Denizet	b8dda089e2	Slightly modify the max width of a tracking space	2023-03-07 19:38:49 +01:00
Ben Wagner	158c836e26	Correct PostScript trigonometric operators PDF 32000-1:2008 7.10.5.1 "Type 4 (PostScript Calculator) Functions" defers to the PostScript Language Reference for the description of these functions. The PostScript Language Reference, third edition chapter 8 "Operators" defines the `angle` type as a "number of degrees". Section 8.1 defines "angle `sin` real", "angle `cos` real", and "num den `atan` angle". The documentation for `atan` further states that it will return an angle in degrees between 0 and 360. Handle these operators correctly in `PostScriptEvaluator.execute`. Convert the inputs to `sin` and `cos` from degrees to radians for use with `Math.sin` and `Math.cos`. Correctly pop two values from the stack for `atan`, use `Math.atan2`, and convert from radians to (positive) degrees.	2023-03-03 17:25:11 -05:00
Calixte Denizet	fd03cd5493	[api-minor] Generate images in the worker instead of the main thread. We introduced the use of OffscreenCanvas in #14754 and this patch aims to use them for all kind of images. It'll slightly improve performances (and maybe slightly decrease memory use). Since an image can be rendered in using some transfer maps but because of OffscreenCanvas we don't have the underlying pixels array the transfer maps stuff is re-implemented in using the SVG filter feComponentTransfer.	2023-03-01 17:40:12 +01:00
Jonas Jenwald	f42a2e8451	[api-minor] Move the `canvasFactory` option into `getDocument` Rather than repeatedly initializing a `canvasFactory`-instance for every page, move it to the document-level instead. Please note: This patch is written using the GitHub UI, since I'm currently without a dev machine, so hopefully it works correctly.	2023-03-01 09:07:16 +01:00
Calixte Denizet	3a21423386	[Acroform] Use the full path to find the node in the XFA datasets where to store the value I noticed several 'Path not found' errors because of a field called #subform[2]. From the XFA specs, the hash is used for a class of elements in the template tree. When we're looking for a node in the datasets tree, it doesn't make sense to search for a class. Hence the path element starting with a hash are just skipped.	2023-02-23 12:09:39 +01:00
Calixte Denizet	fc7d74385f	Don't replace an eol by a whitespace when the last char is a Katakana-Hiragana diacritic	2023-02-16 11:31:58 +01:00
Jonas Jenwald	6d4d402a78	Move the `arrayBuffersToBytes` helper function into the worker-thread Given that this helper function is only used on the worker-thread, there's no reason to duplicate it in both of the built `pdf.js` and `pdf.worker.js` files.	2023-02-11 21:34:37 +01:00
Calixte Denizet	4e9f26afa3	Ignore position of combining diacritics when getting text (bug 1640217)	2023-02-09 17:13:57 +01:00
Jonas Jenwald	90ffbc1d39	Remove most build-time `require` statements from the viewer (PR 16009 follow-up) This further extends the web-specific import maps introduced in PR 16009, to allow removing most of the build-time `require` statements from the viewer. The few remaining ones are fallbacks used for the COMPONENTS respectively the `legacy` GENERIC builds.	2023-02-07 22:45:19 +01:00
calixteman	ecd86ccffc	Merge pull request #16020 from calixteman/bug1815476 [Annotation] Avoid to encrypt the appearance stream two times (bug 1815476)	2023-02-07 20:57:49 +01:00
Calixte Denizet	ea7b4b4d6c	[Annotation] Avoid to encrypt the appearance stream two times (bug 1815476)	2023-02-07 19:26:46 +01:00
Jonas Jenwald	a98e80c4ff	[GeckoView] Reduce the size of the built viewer Given that the GV-viewer isn't using most of the UI-related components of the default-viewer, we can avoid including them in the built viewer to save space.[1] The least "invasive" way of implementing this, at least that I could come up with, is to leverage import maps with suitable stubs for the GV-viewer. The one slightly annoying thing is that we now have larger import maps across multiple html-files, and you'll need to remember to update all of them when making future changes. --- [1] With this patch, the built `viewer.js` size is 391 kB and `viewer-geckoview.js` is 285 kB.	2023-02-05 14:12:32 +01:00
Tim van der Meij	e848a0e61c	Merge pull request #15981 from Snuffleupagus/cMapPacked-true [api-minor] Let the `cMapPacked` parameter, in `getDocument`, default to `true`	2023-02-04 15:00:26 +01:00
Jonas Jenwald	851c394e64	Remove the `isEmptyObj` unit-test helper function We should be able to let Jasmine simply compare directly against an actually empty Object, rather than using a manually implemented helper function for that.	2023-02-04 12:43:53 +01:00
Jonas Jenwald	c5d6391898	[api-minor] Let the `cMapPacked` parameter, in `getDocument`, default to `true` The initial CMap support was added in PR 4259 using the "raw" Adobe files, however they were quickly deemed to be unnecessarily large. As a result PR 4470 introduced the more compact "binary" CMap format, with both of those PRs being included in the very same release (version `0.8.1334`) . Please note that we've thus never shipped anything except the "binary" CMap files with the PDF library, and furthermore note that we've not even once updated the CMap files since they were originally added almost nine years ago. Requiring users to remember that `cMapPacked = true` is necessary, in addition to setting the `cMapUrl` parameter, in order for CMap loading to work feels like a less than ideal API. Hence this patch, which suggests that we simply let `cMapPacked` default to `true` now.	2023-01-30 15:35:02 +01:00
Calixte Denizet	6f4d037a8e	[JS] Correctly format field with numbers (bug 1811694, bug 1811510) In PR #15757, a value is automatically converted into a number when it's possible but the case of numbers like "000123" has been overlooked and their format must be preserved. When a script is doing something like "foo.value + bar.value" and the values are numbers then "foo.value" must return a number but the displayed value must be what the user entered or what a script set, so this patch is just adding a a field _orginalValue in order to track the value has it has defined. Some people are used to use a comma as decimal separator, hence it must be considered when a value is parsed into a number. This patch is fixing a regression introduced by #15757.	2023-01-26 14:57:02 +01:00
Tim van der Meij	a27d7ba524	Merge pull request #15943 from Snuffleupagus/deprecate-direct-PDFDataRangeTransport [api-minor] Deprecate calling `getDocument` directly with a `PDFDataRangeTransport`-instance	2023-01-21 13:50:20 +01:00
Calixte Denizet	dc94b750de	[GV] Avoid to update the finder when the results aren't complete At the beginning of a search we can an update can be triggered with 0 over 0 found matches. In the GeckoView context, we can't update the finder whenever we want but only when it has been required.	2023-01-20 18:13:16 +01:00
Jonas Jenwald	7976fc7851	[api-minor] Deprecate calling `getDocument` directly with a `PDFDataRangeTransport`-instance In general it's recommended to pass a parameter object when calling the `getDocument`-function in the API, since that's the only way to provide additional options, and the fact that it also accepts a URL or TypedArray directly is now mostly for backwards compatibility reasons. However, the `getDocument`-function also accepts a direct `PDFDataRangeTransport`-instance which just seems unnecessary. Please note: The `PDFDataRangeTransport`-implementation was added specifically for the built-in Firefox PDF Viewer, however it's most likely not commonly used by any third-party (given that it requires manual PDF-data loading). Furthermore, the default-viewer always provides a parameter object when calling the `getDocument`-function and it's thus completely unaffected by these changes.	2023-01-19 14:25:55 +01:00
Jonas Jenwald	8f3fa18c93	Merge pull request #15920 from Snuffleupagus/transfer-pdf-data [api-minor] Enable transferring of TypedArray PDF data by default (PR 15908 follow-up)	2023-01-16 13:20:57 +01:00
Tim van der Meij	9e3adb5ec7	Implement unit tests for the `numberToString` utility function	2023-01-14 15:09:58 +01:00
Tim van der Meij	a6dfcc89fa	Implement unit tests for the `recoverJsURL` utility function	2023-01-14 15:09:58 +01:00
Jonas Jenwald	397f943ca3	[api-minor] Enable transferring of TypedArray PDF data by default (PR 15908 follow-up) This patch removes the recently introduced `transferPdfData` API-option, and simply enables transferring of TypedArray data by default instead of copying it. This will help reduce main-thread memory usage, however it will take ownership of the TypedArrays. Currently this only applies to the following cases: - TypedArrays passed to the `getDocument`-function in the API, in order to open PDF documents from binary data. - TypedArrays passed to a `PDFDataRangeTransport`-instance, used to support custom PDF document fetching/loading (see e.g. the Firefox PDF Viewer). PLEASE NOTE: To avoid being affected by this, please simply copy any TypedArray data before passing it to either of the functions/methods mentioned above. Now that we transfer TypedArray data that we previously only copied, we need to be more careful with input validation. Given how the `{IPDFStreamReader, IPDFStreamRangeReader}.read` methods will always return ArrayBuffer data, which is then transferred to the worker-thread[1], the actual TypedArray data passed to the API thus need to have the same exact size as its underlying ArrayBuffer to prevent issues. Hence we'll check for this and only allow transferring of safe TypedArray data, and fallback to simply copying the data just as before. This obviously shouldn't be an issue in the Firefox PDF Viewer, but for the general PDF.js library we need to be more careful here. --- [1] See `e09ad99973/src/display/api.js (L2492-L2506)` respectively `e09ad99973/src/display/api.js (L2578-L2590)`	2023-01-14 10:39:36 +01:00

1 2 3 4 5 ...

1186 Commits