Sakurai/pdf.js - pdf.js - Gitea on kemo

Sakurai/pdf.js

Author	SHA1	Message	Date
Jonas Jenwald	1fc09f0235	Enable the `unicorn/prefer-string-replace-all` ESLint plugin rule Note that the `replaceAll` method still requires that a global regular expression is used, however by using this method it's immediately obvious when looking at the code that all occurrences will be replaced; please see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/replaceAll#parameters Please find additional details at https://github.com/sindresorhus/eslint-plugin-unicorn/blob/main/docs/rules/prefer-string-replace-all.md	2023-03-23 12:57:10 +01:00
Jonas Jenwald	137a2d6e30	Add even more non-standard ligatures (PR 15517 follow-up) Given that we already create multi-byte ToUnicode entries in other cases, see e.g. the `getNormalizedUnicodes` table, this is hopefully fine.	2023-03-22 10:42:52 +01:00
Jonas Jenwald	d4bcfe8c16	Support multi-byte ToUnicode entries, when using predefined CMaps (issue 16176) Hopefully this makes sense, since we already "create" multi-byte ToUnicode entries in other cases (see e.g. the `getNormalizedUnicodes` table).	2023-03-21 21:35:57 +01:00
Jonas Jenwald	6839f15a32	Merge pull request #16128 from Snuffleupagus/issue-16127 Support (rare) Type3 fonts with Pattern resources (issue 16127)	2023-03-08 12:21:53 +01:00
Jonas Jenwald	e5427ab11b	Merge pull request #16122 from Snuffleupagus/rm-onUnsupportedFeature [api-minor] Remove the deprecated `onUnsupportedFeature` functionality (PR 15758 follow-up)	2023-03-08 12:16:27 +01:00
Calixte Denizet	e9474f1c84	[api-minor] Add an option to set the max canvas area	2023-03-08 10:37:06 +01:00
Jonas Jenwald	471aef5fc6	Support (rare) Type3 fonts with Pattern resources (issue 16127) This simply extends the approach in PR 10727 to also cover Patterns, which shouldn't be a common occurrence in Type3 fonts (since this is the first issue we've seen).	2023-03-08 09:20:52 +01:00
Calixte Denizet	b8dda089e2	Slightly modify the max width of a tracking space	2023-03-07 19:38:49 +01:00
Jonas Jenwald	2f3dcc2327	[api-minor] Remove the deprecated `onUnsupportedFeature` functionality (PR 15758 follow-up) This was deprecated in PR 15758, which has now been included in three official PDF.js releases. While PR 15880 did limit the bundle-size impact of this functionality on e.g. the Firefox PDF Viewer, it still leads to some unnecessary "bloat" that these changes remove. Furthermore, with this being deprecated there'd also be no effort put into e.g. extending the `UNSUPPORTED_FEATURES` list when handling future error cases.	2023-03-07 10:18:43 +01:00
Calixte Denizet	05b0c9d7e6	Render large images even if they're larger than the canvas limits (bug 1720282) The idea is to encode large image in BMP format (which is very simple and doesn't require to compute any checksums) and then use createImageBitmap with a BMP blob (which doesn't suffer of the Canvas/ImageData limits). From a performance point of view, it isn't crazy (generating a large blob + decoding it on the main thread is really not ideal) but at least we've something to display which is a way better than a blank page (and one can notice that most of the time is spent in decoding the image from the pdf stream).	2023-03-05 14:07:07 +01:00
Calixte Denizet	fd03cd5493	[api-minor] Generate images in the worker instead of the main thread. We introduced the use of OffscreenCanvas in #14754 and this patch aims to use them for all kind of images. It'll slightly improve performances (and maybe slightly decrease memory use). Since an image can be rendered in using some transfer maps but because of OffscreenCanvas we don't have the underlying pixels array the transfer maps stuff is re-implemented in using the SVG filter feComponentTransfer.	2023-03-01 17:40:12 +01:00
Jonas Jenwald	45c332110e	Check `OffscreenCanvas` support once on the worker-thread Currently we repeat the `FeatureTest.isOffscreenCanvasSupported` checks all over the worker-thread code, and with upcoming changes this will become even "worse". Hence this patch, which changes the worker-thread default value for the `isOffscreenCanvasSupported`-parameter to `false` and moves the feature-testing into the `BasePdfManager`-constructor. Please note: This patch is written using the GitHub UI, since I'm currently without a dev machine, so hopefully it works correctly.	2023-02-27 12:27:28 +01:00
Calixte Denizet	4e9f26afa3	Ignore position of combining diacritics when getting text (bug 1640217)	2023-02-09 17:13:57 +01:00
Jonas Jenwald	1a69d537c1	[api-minor] Limit the `PDFDocumentLoadingTask.onUnsupportedFeature` functionality to GENERIC builds (PR 15758 follow-up) This was deprecated in PR 15758 but it's unfortunately quite difficult to tell if third-party users are depending on this, e.g. to implement custom error reporting, and if so to what extent. However, thanks to the pre-processor we can limit most of this code to GENERIC builds which still seem like a worthwhile change. These changes reduce the bundle size of the Firefox PDF Viewer by 3.8 kB in total.	2023-01-01 17:53:12 +01:00
Jonas Jenwald	0c1fb4e740	[api-minor] Remove the `PDFDocumentProxy.stats` getter (PR 15758 follow-up) This was deprecated in PR 15758 and given that it's quite unlikely that any third-party users are relying on this functionality, since it was only ever added to support telemetry reporting in the Firefox PDF Viewer, it should hopefully be fine to remove this fairly quickly. These changes reduce the bundle size of the Firefox PDF Viewer by 4.5 kB in total.	2023-01-01 17:06:47 +01:00
Jonas Jenwald	f7449563ef	Merge pull request #15659 from sxyuan/system-font-name-fix [api-minor] Propagate the translated font name to TextContentItem for system fonts	2022-11-08 21:56:49 +01:00
Samuel Yuan	36fb5c1e2b	Propagate the translated font name to TextContentItems. This allows font data for system fonts to be looked up in the PDFObjects.	2022-11-08 11:16:21 -08:00
Jonas Jenwald	c8868a1c7a	[api-minor] Initialize the unicode-category lazily on the `Glyph`-instance The purpose of this patch is twofold: - Initialize the unicode-category data lazily during text-extraction, since this is completely unused during general parsing/rendering. - Stop exposing this data in the API, since it's unused on the main-thread and it seems like it was accidentally included. Obviously these changes are API-observable, but hopefully no user is depending on this. Furthermore, it's trivial for a user to re-create this unicode-category data manually with a regular expression (from the exposed `unicode` property).	2022-11-05 10:12:17 +01:00
Jonas Jenwald	c33b8d7692	Cache the normalized unicode-value on the `Glyph`-instance Currently, during text-extraction, we're repeatedly normalizing and (when necessary) reversing the unicode-values every time. This seems a little unnecessary, since the result won't change, hence this patch moves that into the `Glyph`-instance and makes it lazily initialized. Taking the `tracemonkey.pdf` document as an example: When extracting the text-content there's a total of 69236 characters but only 595 unique `Glyph`-instances, which mean a 99.1 percent cache hit-rate. Generally speaking, the longer a PDF document is the more beneficial this should be. Please note: The old code is fast enough that it unfortunately seems difficult to measure a (clear) performance improvement with this patch, so I completely understand if it's deemed an unnecessary change.	2022-11-03 22:36:53 +01:00
Jonas Jenwald	1e7274e9c6	[api-minor] Move the handling of unbalanced markedContent to the worker-thread (PR 15630 follow-up)	2022-10-27 11:14:54 +02:00
Jonas Jenwald	fa47d4b9b1	Slightly re-factor `PartialEvaluator._simpleFontToUnicode` Given the sheer number of heuristics added to this method over the years, moving the valid unicode found case to the top should improve readability of the code.	2022-10-13 21:42:57 +02:00
Jonas Jenwald	1ea4c4b519	[api-minor] Make `isOffscreenCanvasSupported` configurable via the API (issue 14952) This patch first of all makes `isOffscreenCanvasSupported` configurable, defaulting to `true` in browsers and `false` in Node.js environments, with a new `getDocument` parameter. While you normally want to use this, in order to improve performance, it should still be possible for users to control it (similar to e.g. `isEvalSupported`). The specific problem, as reported in issue 14952, is that the SVG back-end doesn't support the new ImageMask data-format that's introduced in PR 14754. In particular: - When the SVG back-end is used in Node.js environments, this patch will "just work" without the user needing to make any code changes. - If the SVG back-end is used in browsers, this patch will require that `isOffscreenCanvasSupported: false` is added to the `getDocument`-call.	2022-10-07 00:10:46 +02:00
Jonas Jenwald	60f6272ed9	Use more `for...of` loops in the code-base Most, if not all, of this code is old enough to predate the general availability of `for...of` iteration.	2022-10-03 13:08:38 +02:00
Jonas Jenwald	6538409282	Replace some `Array.prototype`-usage with spread syntax We have a few, quite old, call-sites that use the `Array.prototype`-format and which can now be replaced with spread syntax instead.	2022-09-23 09:35:30 +02:00
Calixte Denizet	198e9a3db1	Initialize values in the path bounding box before flushing the operator list (bug 1791583) OperatorList.addOp can trigger a flush if it's required, hence the values passed to it must be correctly initialized in order to avoid some wrong values in the renderer. Because of that a clip path was considered as empty, nothing was clipped, hence the wrong rendering in bug 1791583.	2022-09-20 20:01:54 +02:00
Jonas Jenwald	bb75b36b77	Replace some unnecessary `String.prototype.search` usage Most of the `String.prototype.search` call-sites found throughout the code-base is actually not necessary, since we usually only want a boolean, and those can be replaced with `RegExp.prototype.test` instead.	2022-09-19 12:51:46 +02:00
Jonas Jenwald	f6db7975c5	Enable the ESLint `prefer-spread` rule Note that in a couple of spots the argument could be `undefined` and there we simply disable the rule instead. Please refer to https://eslint.org/docs/latest/rules/prefer-spread	2022-08-06 10:17:00 +02:00
Jonas Jenwald	60bd9580e2	Ignore invalid /CIDToGIDMap-entries when parsing fonts (issue 15139) In the referenced PDF document the fonts have /CIDToGIDMap-entries that cannot be loaded. Hence, only when `ignoreErrors` is set, we'll now ignore these corrupt /CIDToGIDMap-entries and fallback to simply assume that no such data is available. Given that this is clearly a case of a corrupt PDF document, there's no guarantee that this will "fix" things in the general case since a /CIDToGIDMap may be required in order for some composite fonts to render correctly. However, attempting to render something is surely better than skipping a font altogether.	2022-07-20 11:58:44 +02:00
Jonas Jenwald	acd61a138e	Handle errors in the "Loading by ref" code-path in `PartialEvaluator.loadFont` Note how we currently throw a "raw" Error, which is problematical since all of the `PartialEvaluator.loadFont` call-sites expect a Promise to be returned. Furthermore, this also means that we don't benefit from the fallback code-path that now exists below. Please note: Unfortunately I don't have a test-case that fails without this patch, since it's something I happened to notice when reading the code while working on another patch.	2022-07-15 16:33:36 +02:00
Jonas Jenwald	79cfc548fc	Improve text-selection for Type3 fonts with bogus /FontBBox-entries (issue 14999) This extends PR 13461, by also building a fallback bounding box for Type3 fonts that contain a much too small /FontBBox-entry. Please note: While this patch improves things overall, copy-and-pasting still doesn't work perfectly for this document. In particular the lowercase letter "c" cannot be selected/copied, however this can be reproduced in both Adobe Reader and PDFium (in Google Chrome) too, which is caused by a lack of proper /ToUnicode-data in the PDF document.	2022-07-05 14:27:14 +02:00
Calixte Denizet	3789dab307	Always flush the current item with MarkedContent stuff when getting text (#15094 )	2022-06-25 17:19:57 +02:00
Jonas Jenwald	9ac4536693	Enable the `unicorn/prefer-at` ESLint plugin rule (PR 15008 follow-up) Please find additional information here: - https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/at - https://github.com/sindresorhus/eslint-plugin-unicorn/blob/main/docs/rules/prefer-at.md	2022-06-09 21:21:19 +02:00
Jonas Jenwald	5a2899c57e	Skip bogus `d1` operators in Type3-glyphs (issue 14953) In the `src/display/canvas.js` code the `d1` operator will be used to set the clipping region, and it obviously cannot be empty since that prevents the Type3-glyph from rendering. Also, the patch removes an outdated comment; refer to PR 12718.	2022-05-24 12:20:31 +02:00
Jonas Jenwald	5a774b7ed3	Adjust the heuristics for handling of incomplete path operators (issue 14917) This limits the heuristics for handling of incomplete path operators, see PR 9838, to only apply to sequences of such operators. In practice a couple of invalid path operators are (hopefully) unlikely to completely break rendering, whereas a sequence of them will easily lead to fairly chaotic rendering artifacts.	2022-05-15 11:24:39 +02:00
Jonas Jenwald	8267fd8a52	Replace the `AnnotationStorage.lastModified`-getter with a proper hash-method The current `lastModified`-getter, which only contains a time-stamp, is a fairly crude way of detecting if the stored data has actually been changed. In particular, when the `getRawValue`-method is used, the `lastModified`-getter doesn't cope with data being modified from the "outside". To fix these issues[1], and to prevent any future bugs in this code, this patch introduces a new `AnnotationStorage.hash`-getter which computes a hash of the currently stored data. To simplify things this re-uses the existing `MurmurHash3_64`-implementation, which required moving that file into the `src/shared/`-folder, since its performance should be good enough here. --- [1] Given how the `AnnotationStorage.lastModified`-getter was used, this would have been limited to printing of forms.	2022-05-04 15:21:30 +02:00
Jonas Jenwald	fbf6dee8ee	[api-minor] Remove the `forceClamped`-functionality in the Streams (issue 14849) As it turns out, most of the code-paths in the `PDFImage`-class won't actually pass the TypedArray (containing the image-data) to the `ColorSpace`-code. Hence we generally don't need to force the image-data to be a `Uint8ClampedArray`, and can just as well directly use a `Uint8Array` instead. In the following cases we're returning the data without any `ColorSpace`-parsing, and the exact TypedArray used shouldn't matter: - `b72a448327/src/core/image.js (L714)` - `b72a448327/src/core/image.js (L751)` In the following cases the image-data is only used internally, and again the exact TypedArray used shouldn't matter: - `b72a448327/src/core/image.js (L762)` with the actual image-data being defined (as `Uint8ClampedArray`) further below - `b72a448327/src/core/image.js (L837)` Please note: This is tagged `api-minor` because it's API-observable, given that some image/mask-data will now be returned as `Uint8Array` rather than using `Uint8ClampedArray` unconditionally. However, that seems like a small price to pay to (slightly) reduce memory usage during image-conversion.	2022-04-29 14:46:30 +02:00
Jonas Jenwald	e18edf38db	Add a helper function for incrementing the `count` of cached ImageMasks While working on PR 14825, I couldn't help noticing that the code to increment the `count` for cached ImageMasks was repeated multiple times. Hence it makes sense, as far as I'm concerned, to move this into a helper function instead.	2022-04-24 11:10:02 +02:00
Tim van der Meij	752dee5caa	Merge pull request #14825 from Snuffleupagus/issue-14824 Ensure that worker-thread image caching doesn't break optional content (issue 14824)	2022-04-23 13:19:56 +02:00
Jonas Jenwald	6c229dffb1	Ensure that worker-thread image caching doesn't break optional content (issue 14824) Currently we only insert optionalContent-data into the operatorList the first time that an image is parsed, which will (in hindsight) obviously cause problems for cached images. Hence we also need to insert the optionalContent-data in the various worker-thread image caches, such that it can be accessed in the fast-paths that are used to skip re-parsing of images. In order to reduce the amount of repeated code, this patch also adds a new `OperatorList`-method that takes care of inserting the necessary data in the operatorList.	2022-04-22 14:49:16 +02:00
Jonas Jenwald	e723da7261	Ignore invalid /Encoding-entries when parsing fonts (issue 14821) In the referenced PDF document the fonts have /Encoding-entries that are Streams (containing completely bogus data), which are thus obviously not valid here. Hence, only when `ignoreErrors` is set, we'll now ignore these corrupt /Encoding-entries and fallback to the existing code to try and infer a usable encoding. Given that this is clearly a case of corrupt PDF documents, there's no guarantee that this will "fix" all such cases, however it's the best that we do here and shouldn't really be worse than ignoring an entire font.	2022-04-22 11:49:03 +02:00
Calixte Denizet	4b7691baf6	Simplify min/max computations in constructPath (bug 1135277) - most of the time the current transform is a scaling one (modulo translation), hence it's possible to avoid to apply the transform on each bbox and then apply it a posteriori; - compute the bbox when it's possible in the worker.	2022-04-17 17:25:54 +02:00
Calixte Denizet	f62d961dfe	Improve performances with image masks (bug 857031) - it's the second part of the fix for https://bugzilla.mozilla.org/show_bug.cgi?id=857031; - some image masks can be used several times but at different positions; - an image need to be pre-process before to be rendered: * rescale it; * use the fill color/pattern. - the two operations above are time consuming so we can cache the generated canvas; - the cache key is based on the current transform matrix (without the translation part) and the current fill color when it isn't a pattern. - the rendering of the pdf in the above bug is really faster than without this patch.	2022-04-16 20:48:39 +02:00
Tim van der Meij	cdb3481d6c	Merge pull request #14764 from apeltop/correct-typos Correct typos	2022-04-10 14:55:08 +02:00
Calixte Denizet	040fcae5ab	Improve performance with image masks (bug 857031) - it aims to partially fix performance issue reported: https://bugzilla.mozilla.org/show_bug.cgi?id=857031; - the idea is too avoid to use byte arrays but use ImageBitmap which are a way faster to draw: * an ImageBitmap is Transferable which means that it can be built in the worker instead of in the main thread: - this is achieved in using an OffscreenCanvas when it's available, there is a bug to enable them for pdf.js: https://bugzilla.mozilla.org/show_bug.cgi?id=1763330; - or in using createImageBitmap: in Firefox a task is sent to the main thread to build the bitmap so it's slightly slower than using an OffscreenCanvas. * it's transfered from the worker to the main thread by "reference"; * the byte buffers used to create the image data have a very short lifetime and ergo the memory used is globally less than before. - Use the localImageCache for the mask; - Fix the pdf issue4436r.pdf: it was expected to have a binary stream for the image; - Move the singlePixel trick from operator_list to image: this way we can use this trick even if it isn't in a set as defined in operator_list.	2022-04-09 18:26:26 +02:00
apeltop	a97dd26389	Correct typos	2022-04-09 09:43:18 +09:00
Calixte Denizet	18e79e3c0b	[text selection] Add the whitespaces present in the pdf in the text chunk - it aims to fix issue #14627; - the basic idea of the recent text refactoring was to only consider the rendered visible whitespaces. But sometimes, the heuristics aren't correct and although some whitespaces are in the text stream they weren't in the text chunks because they were too small. Hence we added some exceptions, for example, we always add a whitespace when it is between two non-whitespace chars but only when in the same Tj. So basically, this patch removes the constraint to have the chars in the same Tj (in using a circular buffer to save the two last chars) but don't add a space when the visible space is really too small (hence `NOT_A_SPACE_FACTOR`).	2022-03-27 14:34:56 +02:00
Jonas Jenwald	73d2ddac0d	Update npm packages Note that the Prettier update made it possible to move a couple of comments after `default:`-cases back to their original/intended positions, please see https://prettier.io/blog/2022/03/16/2.6.0.html	2022-03-20 10:59:13 +01:00
Jonas Jenwald	c0736647f9	Add general iteration support in the `RefSet` and `RefSetCache` classes This patch removes the existing `forEach` methods, in favor of making the classes properly iterable instead. Given that the classes are using a `Set` respectively a `Map` internally, implementing this is very easy/efficient and allows us to simplify some existing code.	2022-03-18 14:27:34 +01:00
Jonas Jenwald	d0d5c596fb	When `stopAtErrors` is set, throw rather than warn when exceeding `maxImageSize` (issue 14626) The situation described in issue 14626 seems like a fairly special case, and it thus seem reasonable that we simply follow the same pattern as elsewhere in the `PartialEvaluator` when the `stopAtErrors` API-option is being used.	2022-03-03 13:11:29 +01:00
Jonas Jenwald	99cd24ce3e	Remove the `isString` helper function The call-sites are replaced by direct `typeof`-checks instead, which removes unnecessary function calls. Note that in the `src/`-folder we already had more `typeof`-cases than `isString`-calls.	2022-02-26 16:33:41 +01:00

1 2 3 4 5 ...