Sakurai/pdf.js - pdf.js - Gitea on kemo

Sakurai/pdf.js

Author	SHA1	Message	Date
Jonas Jenwald	77b258440b	Move some constants and helper functions `from src/core/fonts.js` and into their own file - `FontFlags`, is used in both `src/core/fonts.js` and `src/core/evaluator.js`. - `getFontType`, same as the above. - `MacStandardGlyphOrdering`, is a fairly large data-structure and `src/core/fonts.js` is already a very large file. - `recoverGlyphName`, a dependency of `type1FontGlyphMapping`; please see below. - `SEAC_ANALYSIS_ENABLED`, is used by both `Type1Font`, `CFFFont`, and unit-tests; please see below. - `type1FontGlyphMapping`, is used by both `Type1Font` and `CFFFont` which a later patch will move to their own files.	2021-05-02 21:00:29 +02:00
Jonas Jenwald	6912bb5e0a	Move the `IdentityToUnicodeMap`/`ToUnicodeMap` from `src/core/fonts.js` and into its own file	2021-05-02 21:00:29 +02:00
Tim van der Meij	f6f335173d	Merge pull request #13303 from Snuffleupagus/BaseStream Add an abstract base-class, which all the various Stream implementations inherit from	2021-05-01 19:13:36 +02:00
calixteman	af4dc55019	[api-minor] Fix the way to chunk the strings (#13257 ) - Improve chunking in order to fix some bugs where the spaces aren't here: * track the last position where a glyph has been drawn; * when a new glyph (first glyph in a chunk) is added then compare its position with the last saved one and add a space or break: - there are multiple ways to move the glyphs and to avoid to have to deal with all the different possibilities it's a way easier to just compare positions; - and so there is now one function (i.e. "compareWithLastPosition") where all the job is done. - Add some breaks in order to get lines; - Remove the multiple whites spaces: * some spaces were filled with several whites spaces and so it makes harder to find some sequences of words using the search tool; * other pdf readers replace spaces by one white space. Update src/core/evaluator.js Co-authored-by: Jonas Jenwald <jonas.jenwald@gmail.com> Co-authored-by: Jonas Jenwald <jonas.jenwald@gmail.com>	2021-04-30 14:41:13 +02:00
Jonas Jenwald	30a22a168d	Move the `DecodeStream` and `StreamsSequenceStream` from `src/core/stream.js` and into its own file	2021-04-28 10:16:51 +02:00
Jonas Jenwald	da22146b95	Replace a bunch of `Array.prototype.forEach()` cases with `for...of` loops instead Using `for...of` is a modern and generally much nicer pattern, since it gets rid of unnecessary callback-functions. (In a couple of spots, a "regular" `for` loop had to be used.)	2021-04-24 13:00:19 +02:00
Jonas Jenwald	7fab73ed23	For CFF fonts without proper `ToUnicode`/`Encoding` data, utilize the "charset"/"Encoding"-data from the font file to improve text-selection (issue 13260) This patch extends the approach, implemented in PR 7550, to also apply to CFF fonts.	2021-04-20 20:48:44 +02:00
Brendan Dahl	ac3fa1e3d7	Merge pull request #13146 from calixteman/xfa_fonts XFA -- Load fonts permanently from the pdf	2021-04-16 12:55:12 -07:00
Calixte Denizet	7e9579045f	XFA -- Load fonts permanently from the pdf - Different fonts can be used in xfa and some of them are embedded in the pdf. - Load all the fonts in window.document. Update src/core/document.js Co-authored-by: Jonas Jenwald <jonas.jenwald@gmail.com> Update src/core/worker.js Co-authored-by: Jonas Jenwald <jonas.jenwald@gmail.com>	2021-04-15 17:57:42 +02:00
Jani Pehkonen	3a96977ea8	Implement visibility expressions for optional content	2021-04-14 17:39:41 +03:00
Brendan Dahl	fc9501a637	Add support for basic structure tree for accessibility. When a PDF is "marked" we now generate a separate DOM that represents the structure tree from the PDF. This DOM is inserted into the <canvas> element and allows screen readers to walk the tree and have more information about headings, images, links, etc. To link the structure tree DOM (which is empty) to the text layer aria-owns is used. This required modifying the text layer creation so that marked items are now tracked.	2021-04-09 09:56:28 -07:00
Jonas Jenwald	68d3a333ac	Change the `seenStyles` object, in `PartialEvaluator.getTextContent`, to a Set Given that what we actually want is only to keep track of the loadedFont-names, rather than storing any actual data, using an object isn't really necessary here. Furthermore, in the current code, we're also using `in` when checking if the data exists, which is generally less efficient than just checking for the value directly.	2021-04-05 10:34:02 +02:00
Jonas Jenwald	0eb1433c78	[api-minor] Change the format of the `fontName`-property, in `defaultAppearanceData`, on Annotation-instances (PR 12831 follow-up) Currently the `fontName`-property contains an actual /Name-instance, which is a problem given that its fallback value is an empty string; see `ca7f546828/src/core/default_appearance.js (L35)` The reason that this is a problem can be seen in `ca7f546828/src/core/primitives.js (L30-L34)`, since an empty string short-circuits the cache. Essentially, in PDF documents, a /Name-instance cannot be empty and the way that the `DefaultAppearanceEvaluator` does things is unfortunately not entirely correct. Hence the `fontName`-property is changed to instead contain a string, rather than a /Name-instance, which simplifies the code overall. Please note: I'm tagging this patch with "[api-minor]", since PR 12831 is included in the current pre-release (although we're not using the `fontName`-property in the display-layer).	2021-04-01 16:47:30 +02:00
Jonas Jenwald	1ee747a620	Remove unneeded `instanceof MissingDataException` checks The following checks are all unneeded, and could easily cause confusion when reading the code. (All of them are my fault as well, since I've sometimes added those checks without really thinking about the surrounding code.) - In `PartialEvaluator.hasBlendModes` there cannot be any `MissingDataException`s thrown, given that the `Page.getOperatorList` method waits for all the necessary /Resources to load first. Furthermore, note also that if an error is thrown from `PartialEvaluator.hasBlendModes` then it'd completely break rendering of that page, since any errors thrown from `Page.getOperatorList` are simply sent to the main-thread. - In `PartialEvaluator.handleColorN` there cannot be any `MissingDataException`s thrown, given that again the `Page.getOperatorList` method waits for all the necessary /Resources to load before operatorList parsing starts. - In `XRef.readXRef` there cannot be any `MissingDataException`s thrown, given that we're explicitly requesting (and waiting for) the entire document in `pdfManagerReady` (in `src/core/worker.js`) before re-parsing of a corrupt document starts.	2021-02-13 12:26:05 +01:00
Jonas Jenwald	31098c404d	Use `Math.hypot`, instead of `Math.sqrt` with manual squaring (#12973 ) When the PDF.js project started `Math.hypot` didn't exist yet, and until recently we still supported browsers (IE 11) without a native `Math.hypot` implementation; please see this compatibility information: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Math/hypot#browser_compatibility Furthermore, somewhat recently there were performance improvements of `Math.hypot` in Firefox; see https://bugzilla.mozilla.org/show_bug.cgi?id=1648820 Finally, this patch also replaces a couple of multiplications with the exponentiation operator.	2021-02-10 12:28:49 +01:00
Jonas Jenwald	e6fe8a7d53	Handle errors gracefully, in `PartialEvaluator.translateFont`, when fetching the font file (issue 9462) The third page of the referenced PDF document currently fails to render completely, since one of its font files fail to load. Since that error isn't handled, a large part of the text is thus missing which looks quite bad. By "replacing" the font data with an empty stream, we'll thus be able to fallback to rendering the text with a standard font (instead of using `ErrorFont`). While there's obviously no guarantee that things will look perfect, actually rendering the text at all should be an improvement in general. Also, print a warning in `PartialEvaluator.loadFont` when the `PartialEvaluator.translateFont` method rejects, since that'd have helped debug/fix the issue faster.	2021-02-06 19:44:53 +01:00
Tim van der Meij	e4e92d10e8	Merge pull request #12922 from Snuffleupagus/getTextContent-globalImageCache Ignore globally cached images in `PartialEvaluator.getTextContent` (PR 11930 follow-up)	2021-01-28 23:44:10 +01:00
Tim van der Meij	8805614a03	Merge pull request #12924 from brendandahl/fix-clone Fix font data clone error when pdfBug is enabled.	2021-01-28 23:42:12 +01:00
Jonas Jenwald	72da2aa166	Ignore globally cached images in `PartialEvaluator.getTextContent` (PR 11930 follow-up) Given that we'll only cache `/XObject`s of the `Image`-type globally, we can utilize that in `PartialEvaluator.getTextContent` as well. This way, in cases such as e.g. issue 12098, we can avoid having to fetch/parse `/XObject`s that we already know to be `Image`s. This is helpful, since `Stream`s are not cached on the `XRef` instance (given their potential size) and the lookup can thus be somewhat expensive in general. Also, skip a redundant `RefSetCache.has` check in the `GlobalImageCache.getData` method.	2021-01-28 10:19:26 +01:00
Brendan Dahl	52fb5abb0b	Fix font data clone error when pdfBug is enabled. The widths property should be an object to match what metrics returns. In ZapfDingbats.pdf I was getting a data clone error with pdfBug enabled. In buildCharCodeToWidth() there was an encoding with the name "at" which is also the name of a method on an array. buildCharCodeToWidth assumes an object is passed in, so when it checked for the "at" property, it found the method and copied it over. This only seemed to affect Firefox.	2021-01-27 14:38:43 -08:00
Jonas Jenwald	1ab6d2c604	Improve global image caching for small images (PR 11912 follow-up, issue 12098) When implementing the `GlobalImageCache` functionality I was mostly worried about the effect of very large images, hence the maximum number of cached images were purposely kept quite low[1]. However, there's one fairly obvious problem with that approach: In documents with hundreds, or even thousands, of small images the `GlobalImageCache` as implemented becomes essentially pointless. Hence this patch, where the `GlobalImageCache`-implementation is changed in the following ways: - We're still guaranteed to be able to cache a minimum number of images, set to `10` (similar as before). - If the total size of all the cached image data is below a threshold[2], we're allowed to cache additional images. This patch thus improve, but doesn't completely fix, issue 12098. Note that that document is created by a very poor PDF generator, since every single page contains the entire document (with all of its /Resources) and to create the individual pages clipping is used.[3] --- [1] Currently set to `10` images; imagine what would happen to overall memory usage if we encountered e.g. 50 images each 10 MB in size. [2] This value was chosen, somewhat randomly, to be `40` megabytes; basically five times the [maximum individual image size per page](`6249ef517d/src/display/api.js (L2483-L2484)`). [3] This surely has to be some kind of record w.r.t. how badly PDF generators can mess things up...	2021-01-26 12:00:12 +01:00
Jonas Jenwald	8137c0547d	Fix the `gStateObj` lookup in `TranslatedFont._removeType3ColorOperators` (PR 12718 follow-up) As can be seen in `2cba290361/src/core/evaluator.js (L986)` the `gStateObj` (which is actually an Array despite its name), is wrapped in Array when it's inserted into the OperatorList. Hence we obviously need to take this into account when accessing it in `TranslatedFont._removeType3ColorOperators`; this mistake happened because we don't have any test-cases for this particular code-path as far as I know.	2021-01-22 12:27:38 +01:00
calixteman	1039698697	Add a parser to get font data from the default appearance (#12831 ) * Add a parser to get font data from the default appearance - pdfium & poppler use a special parser too to get these info. * Update src/core/default_appearance.js Co-authored-by: Jonas Jenwald <jonas.jenwald@gmail.com> Co-authored-by: Jonas Jenwald <jonas.jenwald@gmail.com>	2021-01-21 20:15:31 +01:00
Jonas Jenwald	78c32c2697	Improve the handling of errors, in `PartialEvaluator.loadFont`, occuring in `PartialEvaluator.preEvaluateFont` (issue 12823) Currently any errors thrown in `preEvaluateFont`, which is a synchronous method, will not be handled at all in the `loadFont` method and we were thus failing to return an `ErrorFont`-instance as intended here. Also, add an explicit check in `PartialEvaluator.preEvaluateFont` to ensure that Type0-fonts always have a valid dictionary.	2021-01-07 11:38:38 +01:00
Jonas Jenwald	67e5db75d8	Ignore color-operators in Type3 glyphs beginning with a `d1` operator (issue 12705) Please refer to the PDF specification at https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G8.1977497 and https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G7.3998470 This patch removes the color-operators in the evaluator, since that should be more efficient than doing it repeatedly in the main-thread when rendering the Type3 glyphs.	2020-12-11 15:49:13 +01:00
Jonas Jenwald	082cd8fc6c	Add global caching, for /Resources without blend modes, and use it to reduce repeated fetching/parsing in `PartialEvaluator.hasBlendModes` The `PartialEvaluator.hasBlendModes` method is necessary to determine if there's any blend modes on a page, which unfortunately requires synchronous parsing of the /Resources of each page before its rendering can start (see the "StartRenderPage"-message). In practice it's not uncommon for certain /Resources-entries to be found on more than one page (referenced via the XRef-table), which thus leads to unnecessary re-fetching/re-parsing of data in `PartialEvaluator.hasBlendModes`. To improve performance, especially in pathological cases, we can cache /Resources-entries when it's absolutely clear that they do not contain any blend modes at all[1]. This way, subsequent `PartialEvaluator.hasBlendModes` calls can be made significantly more efficient. This patch was tested using the PDF file from issue 6961, i.e. https://github.com/mozilla/pdf.js/files/121712/test.pdf: ``` [ { "id": "issue6961", "file": "../web/pdfs/issue6961.pdf", "md5": "a80e4357a8fda758d96c2c76f2980b03", "rounds": 100, "type": "eq" } ] ``` which gave the following results when comparing this patch against the `master` branch: ``` -- Grouped By browser, page, stat -- browser \| page \| stat \| Count \| Baseline(ms) \| Current(ms) \| +/- \| % \| Result(P<.05) ------- \| ---- \| ------------ \| ----- \| ------------ \| ----------- \| ---- \| ------ \| ------------- firefox \| 0 \| Overall \| 100 \| 1034 \| 555 \| -480 \| -46.39 \| faster firefox \| 0 \| Page Request \| 100 \| 489 \| 7 \| -482 \| -98.67 \| faster firefox \| 0 \| Rendering \| 100 \| 545 \| 548 \| 2 \| 0.45 \| firefox \| 1 \| Overall \| 100 \| 912 \| 428 \| -484 \| -53.06 \| faster firefox \| 1 \| Page Request \| 100 \| 487 \| 1 \| -486 \| -99.77 \| faster firefox \| 1 \| Rendering \| 100 \| 425 \| 427 \| 2 \| 0.51 \| ``` --- [1] In the case where blend modes are found, it becomes a lot more difficult to know if it's generally safe to skip /Resources-entries. Hence we don't cache anything in that case, however note that most document/pages do not utilize blend modes anyway.	2020-11-05 16:59:08 +01:00
Jonas Jenwald	46e94cad17	Fix some errors reported by the ESLint `no-useless-escape` rule This patch removes unnecessary escape-sequence in (mostly) strings, as a first step, since the ones in regular expressions probably requires more careful testing (just in case). The only exception is a regular expression in `src/core/annotation.js`, since we should have both unit- and reference-tests for this code and given [this information on MDN](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions/Character_Classes#Types): > Inside a character set, the dot loses its special meaning and matches a literal dot. Please find additional details about the ESLint rule at https://eslint.org/docs/rules/no-useless-escape	2020-10-29 15:40:40 +01:00
Tim van der Meij	b4ca3d55b8	Merge pull request #12508 from calixteman/button_fallback_font Fallback font for buttons must be ZapfDingbats.	2020-10-24 18:56:12 +02:00
Jonas Jenwald	b478d3e7b9	Improve argument/name handling when parsing TilingPatterns (PR 12458 follow-up) - Handle the arguments correctly in `PartialEvaluator.handleColorN`. For TilingPatterns with a base-ColorSpace, we're currently using the `args` when computing the color. However, as can be seen we're passing the Array as-is to the `ColorSpace.getRgb` method, which means that the `Name` is included as well.[1] Thankfully this hasn't, as far as I know, caused any actual bugs, but that may be more luck than anything else given how the `ColorSpace` code is implemented. This can be easily fixed though, simply by popping the `Name`-object off of the `args` Array. - Cache TilingPatterns using the `Name`-string, rather than the object directly. This is not only consistent with other caches in `PartialEvaluator`, but importantly it also ensures that the cache lookup always works correctly. Note that since `Name`-objects, similar to other primitives, uses a cache themselves a manually triggered `cleanup`-call could thus (theoretically) cause the `LocalTilingPatternCache` to not find an existing entry. While the likelihood of this happening is extremely small, it's still something that we should fix. --- [1] The `args` Array can e.g. look like this: `[0.043, 0.09, 0.188, 0.004, /P1]`, which means that we're passing in the `Name`-object to the `ColorSpace` method.	2020-10-24 13:49:46 +02:00
Calixte Denizet	37c86b2daa	Fallback font for buttons must be ZapfDingbats. Fix bug https://bugzilla.mozilla.org/show_bug.cgi?id=1669099.	2020-10-24 12:00:03 +02:00
Jonas Jenwald	f956d0a96a	Stop caching the parsed Font data on its `Dict` object (PR 7347 follow-up) Given that all fonts are, ever since PR 7347, now cached in the "normal" `fontCache` there's actually no reason for the special `font.translated` construction. (Given how Objects in JavaScript are references, rather than raw values, the old code shouldn't have caused any significant memory overhead.) Instead we can simply store the `cacheKey`, which is a simple string, on only the Font `Dict`s where it's needed and thus look-up all fonts using the `fontCache` instead.	2020-10-16 17:45:01 +02:00
Jonas Jenwald	bc6b47a50e	Convert `PartialEvaluator.translateFont` to an `async` method This allows us to make a slight simplification in `PartialEvaluator.loadFont`, which thus removes an old TODO-comment from the method. Furthermore, in `PartialEvaluator.translateFont`, the CMap-handling is now limited to only composite fonts to avoid having to wait for a "dummy"-Promise for most fonts.	2020-10-15 09:42:58 +02:00
Jonas Jenwald	30e8d5dea1	Add local caching of TilingPatterns in `PartialEvaluator.getOperatorList` (issue 2765 and 8473) In practice it's not uncommon for PDF documents to re-use the same TilingPatterns more than once, and parsing them is essentially equal to parsing of a (small) page since a `getOperatorList` call is required. By caching the internal TilingPattern representation we can thus avoid having to re-parse the same data over and over, and there's also less asynchronous parsing required for repeated TilingPatterns. Initially I had intended to include (standard) benchmark results with this patch, however it's not entirely clear that this is actually necessary here given the preliminary results. When testing this manually in the development viewer, using `pdfBug=Stats`, the following (approximate) reduction in rendering times were observed when comparing `master` against this patch: - http://pubs.usgs.gov/sim/3067/pdf/sim3067sheet-2.pdf (from issue 2765): `6800 ms` -> `4100 ms`. - https://github.com/mozilla/pdf.js/files/1046131/stepped.pdf (from issue 8473): `54000 ms` -> `13000 ms` - https://github.com/mozilla/pdf.js/files/1046130/proof.pdf (from issue 8473): `5900 ms` -> `2500 ms` As always, whenever you're dealing with documents which are "slow", there's usually a certain level of subjectivity involved with regards to what's deemed acceptable performance. Hence it's not clear to me that we want to regard any of the referenced issues as fixed, however the improvements are significant enough to warrant caching of TilingPatterns in my opinion.	2020-10-08 18:43:21 +02:00
Jonas Jenwald	9416b14e8b	Re-factor how the ESLint `no-var` rule is enabled in the `src/` folder This simplifies/consolidates the ESLint configuration slightly in the `src/` folder, and prevents the addition of any new files where `var` is being used.[1] Hence we no longer need to manually add `/* eslint no-var: error */` in files, which is easy to forget, and can instead disable the rule in the `src/core/` files where `var` is still in use. --- [1] Obviously the `no-var` rule can, in the same way as every other rule, be disabled on a case-by-case basis where actually necessary.	2020-10-03 20:15:29 +02:00
Jonas Jenwald	784a420027	Add support, in `Dict.merge`, for merging of "sub"-dictionaries This allows for merging of dictionaries one level deeper than previously. This could be useful e.g. for /Resources dictionaries, where you want to e.g. merge their respective /Font dictionaries (and other) together rather than picking just the first one.	2020-08-30 23:18:32 +02:00
Jonas Jenwald	1058f16605	Add (basic) support for transfer functions to Images (issue 6931, bug 1149713) This is similar to the existing transfer function support for SMasks, but extended to simple image data. Please note that the extra amount of data now being sent to the worker-thread, for affected /ExtGState entries, is limited to at most 4 `Uint8Array`s each with a length of 256 elements. Refer to https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G9.1658137 for additional details.	2020-08-17 10:34:12 +02:00
Jonas Jenwald	9d3e046a4f	Don't cache /ExtGState entries that contain fonts (PR 12087 follow-up) I completely overlooked the fact that `PartialEvaluator.handleSetFont` also updates the current `state`, which means that currently we're not actually handling font data correctly for cached /ExtGState data. (Thankfully, using /ExtGState to set a font is somewhat rare in practice.)	2020-08-17 08:17:25 +02:00
Calixte Denizet	1747d259f9	Support textfield and choice widgets for printing	2020-08-06 14:45:23 +02:00
Brendan Dahl	ac494a2278	Add support for optional marked content. Add a new method to the API to get the optional content configuration. Add a new render task param that accepts the above configuration. For now, the optional content is not controllable by the user in the viewer, but renders with the default configuration in the PDF. All of the test files added exhibit different uses of optional content. Fixes #269. Fix test to work with optional content. - Change the stopAtErrors test to ensure the operator list has something, instead of asserting the exact number of operators.	2020-08-04 09:26:55 -07:00
Jonas Jenwald	835b5ffddd	Only check `isType3Font` the first time that `TranslatedFont.loadType3Data` is called If the `TranslatedFont.type3Loaded` property exists, then you already know that the font must be a Type3 one.	2020-07-27 13:20:15 +02:00
Jonas Jenwald	f3ff526019	Send/receive Type3 images the same way as other globally-cached images There's quite frankly no particular reason to special-case Type3-fonts with image resources, which are very rare anyway, now that we have a general mechanism for sending/receiving images globally.	2020-07-27 13:20:15 +02:00
Jonas Jenwald	7c9d0d5939	Improve how Type3-fonts with dependencies are handled While the `CharProcs` streams of Type3-fonts usually don't rely on dependencies, such as e.g. images, it does happen in some cases. Currently any dependencies are simply appended to the parent operatorList, which in practice means only the operatorList of the first page where the Type3-font is being used. However, there's one thing that's slightly unfortunate with that approach: Since fonts are global to the PDF document, we really ought to ensure that any Type3 dependencies are appended to the operatorList of all pages where the Type3-font is being used. Otherwise there's a theoretical risk that, if one page has its rendering paused, another page may try to use a Type3-font whose dependencies are not yet fully resolved. In that case there would be errors, since Type3 operatorLists are executed synchronously. Hence this patch, which ensures that all relevant pages will have Type3 dependencies appended to the main operatorList. (Note here that the `OperatorList.addDependencies` method, via `OperatorList.addDependency`, ensures that a dependency is only added once to any operatorList.) Finally, these changes also remove the need for the "waiting for the main-thread"-hack that was added to `PartialEvaluator.buildPaintImageXObject` as part of fixing issue 10717.	2020-07-27 13:20:13 +02:00
Jonas Jenwald	ea8e432c45	Add a `getRawValues` method, to `Dict` instances, to provide an easier way of getting all raw values When the old `Dict.getAll()` method was removed, it was replaced with a `Dict.getKeys()` call and `Dict.get(...)` calls (in a loop). While this pattern obviously makes a lot of sense in many cases, there's some instances where we actually want the raw `Dict` values (i.e. `Ref`s where applicable). In those cases, `Dict.getRaw(...)` calls are instead used within the loop. However, by introducing a new `Dict.getRawValues()` method we can reduce the number of (strictly unnecessary) function calls by simply getting the raw `Dict` values directly.	2020-07-17 16:32:00 +02:00
Tim van der Meij	e63d1ebff5	Merge pull request #12087 from Snuffleupagus/LocalGStateCache Add local caching of "simple" Graphics State (ExtGState) data in `PartialEvaluator.{getOperatorList, getTextContent}` (issue 2813)	2020-07-17 16:02:45 +02:00
Jonas Jenwald	b3480842b3	Use a `RefSet`, rather than a plain Object, for tracking already processed nodes in `PartialEvaluator.hasBlendModes`	2020-07-17 09:52:36 +02:00
Jonas Jenwald	03547b5633	Change `PartialEvaluator.setGState` to an `async` method Since this method calls `Dict.get` to fetch data, there could thus be `Error`s thrown in corrupt PDF documents when attempting to resolve an indirect object. To ensure that this won't ever become a problem, we change the method to be `async` such that a rejected Promise would be returned and general OperatorList parsing won't break.	2020-07-15 14:27:18 +02:00
Jonas Jenwald	f20aeb9343	Slightly simplify the code in `PartialEvaluator.hasBlendModes`, e.g. by using `for...of` loops - Replace the existing loops with `for...of` variants instead. - Make use of `continue`, to reduce indentation and to make the code (slightly) easier to follow, when checking `/Resources` entries.	2020-07-15 12:47:11 +02:00
Jonas Jenwald	15fa3f8518	Remove a redundant `/XObject` stream dictionary `objId` check in `PartialEvaluator.hasBlendModes` (PR 6971 follow-up) This case should no longer happen, given the `instanceof Ref` branch just above (added in PR 6971). Also, I've run the entire test-suite locally with `continue` replaced by `throw new Error(...)` and didn't find any problems.	2020-07-15 12:47:11 +02:00
Jonas Jenwald	84476da26e	Handle lookup errors "silently" in `PartialEvaluator.hasBlendModes` (PR 11680 follow-up) Given that this method is used during what's essentially a pre-parsing stage, before the actual OperatorList parsing occurs, on second thought it doesn't seem at all necessary to warn and trigger fallback in cases where there's lookup errors. Please note: Any any errors will still be either suppressed or thrown, according to the `ignoreErrors` option, during the actual OperatorList parsing.	2020-07-15 12:47:07 +02:00
Jonas Jenwald	981ff41b5f	Add local caching of non-font Graphics State (ExtGState) data in `PartialEvaluator.getTextContent` It turns out that `getTextContent` suffers from similar problems with repeated GStates as `getOperatorList`; please see the previous patch. While only `/ExtGState` resources containing Fonts will actually be parsed by `PartialEvaluator.getTextContent`, we're still forced to fetch/validate repeated `/ExtGState` resources even though most of them won't affect the textContent (since they mostly contain purely graphical state). With these changes we also no longer need to immediately reset the current text-state when encountering a `setGState` operator, which may thus improve text-selection in some cases.	2020-07-14 10:34:43 +02:00

1 2 3 4 5 ...