Sakurai/pdf.js - pdf.js - Gitea on kemo

Sakurai/pdf.js

Author	SHA1	Message	Date
Jonas Jenwald	36593d6bbc	Move `JpegStream` and `JpxStream` to their own files	2017-11-11 11:22:16 +01:00
Yury Delendik	85f544f55a	Moves OperatorList and QueueOptimizer into separate file.	2017-10-30 13:29:58 -05:00
Jonas Jenwald	b1472cddbb	Allow `getOperatorList`/`getTextContent` to skip errors when parsing broken XObjects (issue 8702, issue 8704) This patch makes use of the existing `ignoreErrors` property in `src/core/evaluator.js`, see PRs 8240 and 8441, thus allowing us to attempt to recovery as much as possible of a page even when it contains broken XObjects. Fixes 8702. Fixes 8704.	2017-09-29 17:14:21 +02:00
Jonas Jenwald	b8ec518a1e	Split the existing `PDFFunction` in two classes, a private `PDFFunction` and a public `PDFFunctionFactory, and utilize the latter in` PDFDocument `to allow various code to access the methods of` PDFFunction` Follow-up to PR 8909. This requires us to pass around `pdfFunctionFactory` to quite a lot of existing code, however I don't see another way of handling this while still guaranteeing that we can access `PDFFunction` as freely as in the old code. Please note that the patch passes all tests locally (unit, font, reference), and I very much hope that we have sufficient test-coverage for the code in question to catch any typos/mistakes in the re-factoring.	2017-09-29 15:30:53 +02:00
Jonas Jenwald	5c961c76bb	Remove the unused `inline` parameter from various methods/functions in `PDFImage`, and change a couple of methods to use Objects rather than plain parameters The `inline` parameter is passed to a number of methods/functions in `PDFImage`, despite not actually being used. Its value is never checked, nor is it ever assigned to the current `PDFImage` instance (i.e. no `this.inline = inline` exists). Looking briefly at the history of this code, I was also unable to find a point in time where `inline` was being used. As far as I'm concerned, `inline` does nothing more than add clutter to already very unwieldy method/function signatures, hence why I'm proposing that we just remove it. To further simplify call-sites using `PDFImage`/`NativeImageDecoder`, a number of methods/functions are changed to take Objects rather than a bunch of (somewhat) randomly ordered parameters.	2017-09-29 15:30:40 +02:00
Brendan Dahl	10ba292b46	Use font's default width even when 0. Bug 1392647 has a PDF where the default width of the font is 0. It draws some charcodes that don't have glyphs, but we were wrongly using the 1000 default width for these charcodes causing some text to be overlapping.	2017-09-20 11:38:30 -07:00
Jonas Jenwald	dc926ffc0f	Check `isEvalSupported`, and test that `eval` is actually supported, before attempting to use the `PostScriptCompiler` (issue 5573) Currently `PDFFunction` is implemented (basically) like a class with only `static` methods. Since it's used directly in a number of different `src/core/` files, attempting to pass in `isEvalSupported` would result in code that's very messy, not to mention difficult to maintain (since every single `PDFFunction` method call would need to include a `isEvalSupported` argument). Rather than having to wait for a possible re-factoring of `PDFFunction` that would avoid the above problems by design, it probably makes sense to at least set `isEvalSupported` globally for `PDFFunction`. Please note that there's one caveat with this solution: If `PDFJS.getDocument` is used to open multiple files simultaneously, with different `PDFJS.isEvalSupported` values set before each call, then the last one will always win. However, that seems like enough of an edge-case that we shouldn't have to worry about it. Besides, since we'll also test that `eval` is actually supported, it should be fine. Fixes 5573.	2017-09-15 12:02:45 +02:00
Jonas Jenwald	cfb4955a92	Replace the `isArray` helper function with the native `Array.isArray` function Follow-up to PR 8813.	2017-09-01 20:27:13 +02:00
Jonas Jenwald	093afd1212	Replace the `coded` property with `isType3Font` when building the font `properties` object in `PartialEvaluator.translateFont` This appears to simply have been forgotten in the re-factoring in PR 4815, where the `coded` property was renamed to the much more descriptive `isType3Font` property.	2017-08-08 14:03:02 +02:00
Jonas Jenwald	4729e96fb7	Remove leftover `args[0].code` checks from the `OPS.paintXObject` cases in evaluator.js From looking at blame, it seems that these checks became obsolete with PR 692 (which landed close to six years ago). Note how, after that PR, there's no longer anything being assigned to the `code` property of an Object.	2017-08-07 10:48:37 +02:00
Yury Delendik	a1dfbec532	Properly cancel streams and guard at getTextContent.	2017-08-03 16:36:46 -05:00
Jonas Jenwald	814fa1dee3	Remove most `assert()` calls (issue 8506) This replaces `assert` calls with `throw new FormatError()`/`throw new Error()`. In a few places, throwing an `Error` (which is what `assert` meant) isn't correct since the enclosing function is supposed to return a `Promise`, hence some cases were changed to `Promise.reject(...)` and similarily for `createPromiseCapability` instances.	2017-07-21 18:51:02 +02:00
Yury Delendik	d028c26210	Removes error()	2017-07-07 09:40:24 -05:00
Mukul Mishra	0c13d0ff46	Adds Streams API in getTextContent to stream data. This patch adds Streams API support in getTextContent so that we can stream data in chunks instead of fetching whole data from worker thread to main thread. This patch supports Streams API without changing the core functionality of getTextContent. Enqueue textContent directly at getTextContent in partialEvaluator. Adds desiredSize and ready property in streamSink.	2017-06-17 20:03:27 +05:30
Jonas Jenwald	e589834f13	Ensure that `TilingPattern`s have valid (non-zero) /BBox arrays (issue 8330) Fixes 8330.	2017-06-09 21:41:48 +02:00
Jonas Jenwald	a8c87f8019	Fix inconsistent spacing and trailing commas in objects in `src/core/` files, so we can enable the `comma-dangle` and `object-curly-spacing` ESLint rules later on Unfortunately this patch is fairly big, even though it only covers the `src/core` folder, but splitting it even further seemed difficult. http://eslint.org/docs/rules/comma-dangle http://eslint.org/docs/rules/object-curly-spacing Given that we currently have quite inconsistent object formatting, fixing this in one big patch probably wouldn't be feasible (since I cannot imagine anyone wanting to review that); hence I've opted to try and do this piecewise instead. Please note: This patch was created automatically, using the ESLint --fix command line option. In a couple of places this caused lines to become too long, and I've fixed those manually; please refer to the interdiff below for the only hand-edits in this patch. ```diff diff --git a/src/core/evaluator.js b/src/core/evaluator.js index abab9027..dcd3594b 100644 --- a/src/core/evaluator.js +++ b/src/core/evaluator.js @@ -2785,7 +2785,8 @@ var EvaluatorPreprocessor = (function EvaluatorPreprocessorClosure() { t['Tz'] = { id: OPS.setHScale, numArgs: 1, variableArgs: false, }; t['TL'] = { id: OPS.setLeading, numArgs: 1, variableArgs: false, }; t['Tf'] = { id: OPS.setFont, numArgs: 2, variableArgs: false, }; - t['Tr'] = { id: OPS.setTextRenderingMode, numArgs: 1, variableArgs: false, }; + t['Tr'] = { id: OPS.setTextRenderingMode, numArgs: 1, + variableArgs: false, }; t['Ts'] = { id: OPS.setTextRise, numArgs: 1, variableArgs: false, }; t['Td'] = { id: OPS.moveText, numArgs: 2, variableArgs: false, }; t['TD'] = { id: OPS.setLeadingMoveText, numArgs: 2, variableArgs: false, }; diff --git a/src/core/jbig2.js b/src/core/jbig2.js index 5a17d482..71671541 100644 --- a/src/core/jbig2.js +++ b/src/core/jbig2.js @@ -123,19 +123,22 @@ var Jbig2Image = (function Jbig2ImageClosure() { { x: -1, y: -1, }, { x: 0, y: -1, }, { x: 1, y: -1, }, { x: -2, y: 0, }, { x: -1, y: 0, }], [{ x: -3, y: -1, }, { x: -2, y: -1, }, { x: -1, y: -1, }, { x: 0, y: -1, }, - { x: 1, y: -1, }, { x: -4, y: 0, }, { x: -3, y: 0, }, { x: -2, y: 0, }, { x: -1, y: 0, }] + { x: 1, y: -1, }, { x: -4, y: 0, }, { x: -3, y: 0, }, { x: -2, y: 0, }, + { x: -1, y: 0, }] ]; var RefinementTemplates = [ { coding: [{ x: 0, y: -1, }, { x: 1, y: -1, }, { x: -1, y: 0, }], - reference: [{ x: 0, y: -1, }, { x: 1, y: -1, }, { x: -1, y: 0, }, { x: 0, y: 0, }, - { x: 1, y: 0, }, { x: -1, y: 1, }, { x: 0, y: 1, }, { x: 1, y: 1, }], + reference: [{ x: 0, y: -1, }, { x: 1, y: -1, }, { x: -1, y: 0, }, + { x: 0, y: 0, }, { x: 1, y: 0, }, { x: -1, y: 1, }, + { x: 0, y: 1, }, { x: 1, y: 1, }], }, { - coding: [{ x: -1, y: -1, }, { x: 0, y: -1, }, { x: 1, y: -1, }, { x: -1, y: 0, }], - reference: [{ x: 0, y: -1, }, { x: -1, y: 0, }, { x: 0, y: 0, }, { x: 1, y: 0, }, - { x: 0, y: 1, }, { x: 1, y: 1, }], + coding: [{ x: -1, y: -1, }, { x: 0, y: -1, }, { x: 1, y: -1, }, + { x: -1, y: 0, }], + reference: [{ x: 0, y: -1, }, { x: -1, y: 0, }, { x: 0, y: 0, }, + { x: 1, y: 0, }, { x: 0, y: 1, }, { x: 1, y: 1, }], } ]; ```	2017-06-02 11:20:19 +02:00
Jonas Jenwald	982b6aa65b	Convert the files in the `/src/core` folder to ES6 modules Please note that the `glyphlist.js` and `unicode.js` files are converted to CommonJS modules instead, since Babel cannot handle files that large and they are thus excluded from transpilation.	2017-05-30 22:06:21 +02:00
Yury Delendik	5dc8dcdc0f	Merge pull request #8388 from Snuffleupagus/issue-8380 Cache JPEG images, just as we do for other image formats, in `evaluator.js` (issue 8380)	2017-05-17 17:25:51 -05:00
巴里切罗	8d5d97264e	fix(svg) adjust strategy for decoding JPEG images	2017-05-08 11:32:44 +08:00
Jonas Jenwald	0c2ebda31c	Cache JPEG images, just as we do for other image formats, in `evaluator.js` (issue 8380) For some reason, we're putting all kind of images except JPEG into the `imageCache` in `evaluator.js`.[1] This means that in the PDF file in issue 8380, we'll keep sending the same two small images[2] to the main-thread and decoding them over and over. This is obviously hugely inefficient! As can be seen from the discussion in the issue, the performance becomes extremely bad if the user has the addon "Adblock Plus" installed. However, even in a clean Firefox profile, the performance isn't that great. This patch not only addresses the performance implications of the "Adblock Plus" addon together with that particular PDF file, but it also improves the rendering times considerably for all users. Locally, with a clean profile, the rendering times are reduced from `~2000 ms` to `~500 ms` for my setup! Obviously, the general structure of the PDF file and its operator sequence is still hugely inefficient, however I'd say that the performance with this patch is good enough to consider the issue (as it stands) resolved.[3] Fixes 8380. --- [1] Not technically true, since inline images are cached from `parser.js`, but whatever :-) [2] The two JPEG images have dimensions 1x2, respectively 4x2. [3] To make this even more efficient, a new state would have to be added to the `QueueOptimizer`. Given that PDF files this stupid fortunately aren't too common, I'm not convinced that it's worth doing.	2017-05-07 13:07:41 +02:00
Jonas Jenwald	3e20d30afc	Change the signatures of the `PartialEvaluator` "constructor" and its `getOperatorList`/`getTextContent` methods to take parameter objects Currently these methods accept a large number of parameters, which creates quite unwieldy call-sites. When invoking them, you have to remember not only what arguments to supply, but also the correct order, to avoid runtime errors. Furthermore, since some of the parameters are optional, you also have to remember to pass e.g. `null` or `undefined` for those ones. Also, adding new parameters to these methods (which happens occasionally), often becomes unnecessarily tedious (based on personal experience). Please note that I do not think that we need/should convert every single method in `evaluator.js` (or elsewhere in `/core` files) to take parameter objects. However, in my opinion, once a method starts relying on approximately five parameter (or even more), passing them in individually becomes quite cumbersome. With these changes, I obviously needed to update the `evaluator_spec.js` unit-tests. The main change there, except the new method signatures[1], is that it's now re-using one `PartialEvalutor` instance, since I couldn't see any compelling reason for creating a new one in every single test. Note: If this patch is accepted, my intention is to (time permitting) see if it makes sense to convert additional methods in `evaluator.js` (and other `/core` files) in a similar fashion, but I figured that it'd be a good idea to limit the initial scope somewhat. --- [1] A fun fact here, note how the `PartialEvaluator` signature used in `evaluator_spec.js` wasn't even correct in the current `master`.	2017-05-03 12:10:20 +02:00
Jonas Jenwald	95bbc8101c	Replace unnecessary `bind(this)` and `var self = this` statements with arrow functions in `src/core/evaluator.js` Note that by using `let` instead of `var` in `PartialEvaluator.setGState` and `TranslatedFont.loadType3Data`, we can get rid of further `bind` usages since `let` is block-scoped. Also, the fact that `bind` wasn't used in the `Font` case inside of `setGState` is actually a bug which has been present ever since PR 5205, where a closure was replaced by a standard loop.[1] --- [1] I'm not aware of any bugs caused by this, but that is probably more a happy accident than anything else, since e.g. just removing the `bind` from the `SMask` case without using block-scoped variables causes test failures.	2017-05-01 20:29:44 +02:00
Jonas Jenwald	afc74b0178	Enable the `object-shorthand` ESLint rule in `src/shared` Please see http://eslint.org/docs/rules/object-shorthand. For the most part, these changes are of the search-and-replace kind, and the previously enabled `no-undef` rule should complement the tests in helping ensure that no stupid errors crept into to the patch.	2017-04-27 17:29:40 +02:00
Jonas Jenwald	fbe7b2eee7	Always ignore Type3 glyphs if their `OperatorList`s contain errors, regardless of the value of the `stopAtErrors` option Compared to the parsing of e.g. an entire page, it doesn't really make sense to only be able to render a Type3 glyph partially.	2017-04-11 08:59:22 +02:00
Jonas Jenwald	a39d636eb8	[api-minor] Always allow e.g. rendering to continue even if there are errors, and add a `stopAtErrors` parameter to `getDocument` to opt-out of this behaviour (issue 6342, issue 3795, bug 1130815) Other PDF readers, e.g. Adobe Reader and PDFium (in Chrome), will attempt to render as much of a page as possible even if there are errors present. Currently we just bail as soon the first error is hit, which means that we'll usually not render anything in these cases and just display a blank page instead. NOTE: This patch changes the default behaviour of the PDF.js API to always attempt to recover as much data as possible, even when encountering errors during e.g. `getOperatorList`/`getTextContent`, which thus improve our handling of corrupt PDF files and allow the default viewer to handle errors slightly more gracefully. In the event that an API consumer wishes to use the old behaviour, where we stop parsing as soon as an error is encountered, the `stopAtErrors` parameter can be set at `getDocument`. Fixes, inasmuch it's possible since the PDF files are corrupt, e.g. issue 6342, issue 3795, and [bug 1130815](https://bugzilla.mozilla.org/show_bug.cgi?id=1130815) (and probably others too).	2017-04-11 08:59:22 +02:00
Jonas Jenwald	10e5f766a2	Merge pull request #8266 from brendandahl/issue6652 Normalize blend mode names.	2017-04-11 08:54:42 +02:00
Brendan Dahl	4969b2ad97	Normalize blend mode names.	2017-04-10 16:18:08 -07:00
Jonas Jenwald	f41d80bdd3	Enable the `prefer-promise-reject-errors` ESLint rule See http://eslint.org/docs/rules/prefer-promise-reject-errors, note that this is similar to the already used `no-throw-literal` rule.	2017-04-08 11:47:22 +02:00
Jonas Jenwald	e229c21ce1	Remove unnecessary `xref` parameters from various method signatures in `PartialEvaluator`, since `this.xref` is already available in the relevant scope For reasons I don't pretend to understand, we're passing around `xref` arguments to a bunch of methods despite `this.xref` being available in `PartialEvaluator`. This patch is a small first small step towards cleaning up the, often unwieldy, signatures of methods in `PartialEvaluator`.	2017-03-26 14:12:53 +02:00
Jonas Jenwald	e40fd63bd3	In `src/core/evaluator.js`, convert a couple of `if (!someVariable) { error(...); }` instances to `assert(someVariable);` instead Rather than, in a number of places, basically duplicating the logic of `assert` we can simply utilize the function directly instead.	2017-03-26 13:53:13 +02:00
Jonas Jenwald	a7c19d9cbb	Adjust the `yoda` ESLint rule to apply to inequalities as well I happened to notice that some inequalities had the wrong order, and was surprised since I thought that the `yoda` rule should have caught that. However, reading http://eslint.org/docs/rules/yoda#options a bit more closely than previously, it's quite obvious that the `onlyEquality` option does exactly what its name suggests. Hence I think that it makes sense to adjust the options such that only ranges are allowed instead.	2017-03-19 13:27:14 +01:00
Jonas Jenwald	111419a64a	Cache built-in binary CMap files in the worker (issue 4794)	2017-02-16 10:55:39 +01:00
Jonas Jenwald	769c1450b7	[api-minor] Refactor fetching of built-in CMaps to utilize a factory on the `display` side instead, to allow users of the API to provide a custom CMap loading factory (e.g. for use with Node.js) Currently the built-in CMap files are loaded in `src/core/cmap.js` using `XMLHttpRequest` directly. For some environments that might be a problem, hence this patch refactors that to instead use a factory to load built-in CMaps on the main thread and message the data to the worker thread. This is inspired by other recent work, e.g. the addition of the `CanvasFactory`, and to a large extent on the IRC discussion starting at http://logs.glob.uno/?c=mozilla%23pdfjs&s=12+Oct+2016&e=12+Oct+2016#c53010.	2017-02-16 10:55:35 +01:00
Jonas Jenwald	9c34d0aa8c	[api-minor] Add a `getDocument` parameter that allows disabling of the `NativeImageDecoder` (e.g. for use with Node.js) Note that I initially tried to add this as a parameter to the `PDFPageProxy.render` method, such that it could be passed to `PartialEvaluator.getOperatorList`. However, given all the different code-paths that call `getOperatorList` (there's a bunch only in `annotation.js`), this seemed to very quickly become unwieldy and thus difficult to maintain compared to simply using the existing `evaluatorOptions`.	2017-02-06 22:21:34 +01:00
Jonas Jenwald	52e0f51917	Enable the `no-unused-vars` ESLint rule Please see http://eslint.org/docs/rules/no-unused-vars; note that this patch purposely uses the same rule options as in `mozilla-central`, such that it fixes part of issue 7957. It wasn't, in my opinion, entirely straightforward to enable this rule compared to the already existing rules. In many cases a `var descriptiveName = ...` format was used (more or less) to document the code, and I choose to place the old variable name in a trailing comment to not lose that information. I welcome feedback on these changes, since it wasn't always entirely easy to know what changes made the most sense in every situation.	2017-01-29 23:23:17 +01:00
Jonas Jenwald	50c2856097	Move `EOF`/`isEOF` from core/parser.js to core/primitives.js Given the nature of `EOF` and `isEOF`, it seems to me that they really ought to be placed in `core/primitives.js` instead. In general, it doesn't seem great to have to depend on the entire `core/parser.js` file for such simple primitives/helper functions. In particular, while `core/ps_parser.js` is completely separate from `core/parser.js` with regards to its function, it still depends on the latter for just one primitive. Note that compared to e.g. PR 7389, this will not reduce the number of dependencies for `core/ps_parser`, however the new dependency IMHO makes more sense.	2017-01-27 13:37:48 +01:00
Jonas Jenwald	642d8621ef	Replace direct lookup of `uniquePrefix`/`idCounters`, in `Page` instances, with an `idFactory` containing an `createObjId` method instead We're currently making use of `uniquePrefix`/`idCounters` in multiple files, to create unique object id's, and adding a new occurrence of them requires some care to ensure that an object id isn't accidentally reused. Furthermore, having to pass around multiple parameters as we currently do seem like something you want to avoid. Instead, this patch adds a factory which means that there's only one thing that needs to be passed around. And since it's now only necessary to call a method in order to obtain a unique object id, the details are thus abstracted away at the call-sites which avoids accidental reuse of object id's. To test that this works as expected a very simple `Page` unit-test is added, and the existing `Annotation layer` tests are also adjusted slightly.	2017-01-09 23:16:25 +01:00
Jonas Jenwald	4046d67fde	Enable the `no-else-return` ESLint rule Using `else` after `return` is not necessary, and can often lead to unnecessarily cluttered code. By using the `no-else-return` rule in ESLint we can avoid this pattern, see http://eslint.org/docs/rules/no-else-return.	2017-01-09 20:27:39 +01:00
Jonas Jenwald	ddea9a6b04	Improve the handling of `Encoding` dictionary, with `Differences` array, in `PartialEvaluator_preEvaluateFont` I recently happened to look at the code I wrote for PR 5964, which fixed [bug 1157493](https://bugzilla.mozilla.org/show_bug.cgi?id=1157493), and I quickly realized that the solution is way too simplistic. The fact that only using the `length` of a `Differences` array worked seems more like a happy accident for a particular set of font data, but could just as easily be incorrect for other PDF files. Note that in practice, the case where the `Encoding` entry is a regular `Dict` (and not a `Ref` or `Name`) is very rare, hence I don't think that we really need to worry about having to reparse this data. Also, the performance of this code-block is quite a bit better by updating the `hash` with the data from the entire `Differences` array, instead of at every loop iteration.	2016-12-28 21:32:54 +01:00
Ross Johnson	4537590033	Consitently apply textAdvanceScale during building of textContentItems for improved highlighting. Fixes #7878 .	2016-12-14 21:02:19 -06:00
Jonas Jenwald	c5b06cb40d	Ensure that `PartialEvaluator_extractWidths` is able to handle indirect objects in all kinds of "width" data (issue 7855) Fixes 7855.	2016-11-29 20:49:07 +01:00
Jonas Jenwald	451956c0b1	Merge pull request #7628 from Snuffleupagus/issue-7580 Fallback to the `StandardEncoding` for Nonsymbolic fonts without `/Encoding` entry (issue 7580)	2016-11-29 12:37:36 +01:00
Jonas Jenwald	a930f9af15	For commands with with too few arguments, clear out `args` if it's an Array instead of replacing it with `null` in `EvaluatorPreprocessor_read` (issue 7804) For `PartialEvaluator_getTextContent`, the same `args` Array should be re-used for every `EvaluatorPreprocessor_read` call. Hence we want to ensure that it's not accidentally replaced with `null` in `EvaluatorPreprocessor_read`, since otherwise corrupt PDF files (with too few arguments for certain commands) will cause errors in `PartialEvaluator_getTextContent`. Perhaps a micro-optimization, but this patch also changes two `!args` comparisons to `args === null`, since that should be a tiny bit more efficient.	2016-11-16 10:20:29 +01:00
Chas Emerick	85c52f1fd6	Fix getTextContent evaluation to only apply TJ horizontal offsets using numeric items/args While the array argument to TJ should only contain strings and numbers, other unfortunate items are found in PDFs in the wild, e.g.: [(Grandes) 0.0 Tc -250.0 (Client\350les,) 0.0 Tc -250.0 (Financements) 0.0 Tc -250.0 (et) 0.0 Tc -250.0 (March\351s) ] TJ getOperatorList already properly ignores any non-string, non-numeric values in TJ arrays; without this patch to getTextContent, returned text items can have NaN widths due to calculations being applied to those non-numeric values.	2016-10-13 08:08:31 -04:00
Jonas Jenwald	116ba19dd9	Respect the 'ColorTransform' entry in the image dictionary when decoding JPEG images (bug 956965, issue 6574) Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=956965. Fixes 6574.	2016-09-26 21:55:43 +02:00
Jonas Jenwald	356b321f6d	Fallback to the `StandardEncoding` for Nonsymbolic fonts without `/Encoding` entry (issue 7580) Even though this patch passes all tests (unit/font/reference) locally, including the new ones that I added in PR 7621, I'm still a bit nervous about modifying the code that choose the fallback encoding for fonts without an `/Encoding` entry. Note that over the years this code has been changed on a number of occasions, see a possibly incomplete [list here], to deal with various cases of incorrect font data. According to the PDF specification, see http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G8.1904184, it seems that we should fallback to the `StandardEncoding` for Nonsymbolic fonts. There's obviously a risk that fixing this particular issue could break other PDF files for which we don't have tests. However I've tried to change the logic as little as possible in this patch, to hopefully reduce possible breakage. Based on debugging numerous font issue, it seems that a lot of fonts actually set the Symbolic flag, even when they are in fact not Symbolic. Fonts actually marked as Nonsymbolic seem to be somewhat less common, which I hope should reduce the risk of the patch somewhat. Fixes 7580.	2016-09-13 14:07:16 +02:00
Jonas Jenwald	325f7afcca	For embedded Type1 fonts without included `ToUnicode`/`Encoding` data, attempt to improve text selection by using the `builtInEncoding` to amend the `toUnicode` map (issue 6901, issue 7182, issue 7217, bug 917796, bug 1242142) Note that in order to prevent any possible issues, this patch does not try to amend the `toUnicode` data for Type1 fonts that contain either `ToUnicode` or `Encoding` entries in the font dictionary. Fixes, or at least improves, issues/bugs such as e.g. 6658, 6901, 7182, 7217, bug 917796, bug 1242142.	2016-09-11 20:54:10 +02:00
Yury Delendik	ffa99397ad	Merge pull request #7387 from Snuffleupagus/issue-5808 Attempt to ignore multiple identical Tf (setFont) commands in `PartialEvaluator_getTextContent` (issue 5808)	2016-08-30 15:21:41 -05:00
Jonas Jenwald	83ce6f0b6d	Adjust the (applicable) existing `isName` callsites to use the new `isName(v, name)` version of the function	2016-08-10 11:15:08 +02:00
Jonas Jenwald	77c6ed5389	Attempt to ignore multiple identical Tf (setFont) commands in `PartialEvaluator_getTextContent` (issue 5808) This patch improves the performance of issue 5808, but I'm not sure if it's enough to call it fixed. On average, this patch reduces the number of textLayer div's by a factor of 3, and it also reduces the time spend in `getTextContent` by a factor of ~2. The PDF file is generated by `Scribus PDF`, which for reasons I cannot understand is placing redundant `Tf` commands before every showText command. Note how the PDF file also contains lots of (basically) identical fonts, but with slightly different names, which causes unnecessary font-switching. This causes some unnecessary breaking of textLayer div's, but this issue cannot be easily worked around.	2016-07-27 21:37:52 +02:00

... 2 3 4 5 6 ...