Sakurai/pdf.js - pdf.js - Gitea on kemo

Sakurai/pdf.js

Author	SHA1	Message	Date
Calixte Denizet	58e1f51688	XFA - Fix text positions (bug 1718741) - font line height is taken into account by acrobat when it isn't with masterpdfeditor: I extracted a font from a pdf, modified some ascent/descent properties thanks to ttx and the reinjected the font in the pdf: only Acrobat is taken it into account. So in this patch, line heights for some substituted fonts are added. - it seems that Acrobat is using a line height of 1.2 when the line height in the font is not enough (it's the only way I found to fix correctly bug 1718741). - don't use flex in wrapper container (which was causing an horizontal overflow in the above bug). - consequently, the above fixes introduced a lot of small regressions, so in order to see real improvements on reftests, I fixed the regressions in this patch: - replace margin by padding in some case where padding is a part of a container dimensions; - remove some flex display: some containers are wrongly sized when rendered; - set letter-spacing to 0.01px: it helps to be sure that text is not broken because of not enough width in Firefox.	2021-07-09 18:11:12 +02:00
Jonas Jenwald	273d8cb746	Add non-PRODUCTION/TESTING overflow `assert`s to various string helper-functions (issue 6759)	2021-06-27 16:06:30 +02:00
Jonas Jenwald	c4334dcfe7	Allow using the standard font data for non-Type1 fonts (issue 13585, PR 12726 follow-up) Given that we're not imposing any font-type restrictions[1] in the non-/FontDescriptor case, it's not really clear to me why we'd actually need to do that in the general case. Please note that there's some expected movement, all of which should be improvements, in the `fips197.pdf` file with this patch. --- [1] With the exception of Type3-fonts, of course.	2021-06-20 11:13:49 +02:00
Jonas Jenwald	d9ed14a2f5	Set the default value of `useSystemFonts` correctly, depending on `disableFontFace`, in the API (PR 13516 follow-up) Sorry about the churn here, since the change that I made in PR 13516 was not very smart. With the current code, it's now impossible for a user to actually control the `useSystemFonts` option manually. To prevent outright breakage we obviously still need to default to setting `useSystemFonts = false` when `disableFontFace === true`, however that should be possible for an API consumer to override.	2021-06-19 13:53:13 +02:00
Jonas Jenwald	229a49b9b9	Re-factor the `fallbackToUnicode` functionality (PR 9192 follow-up) Rather than having to create and check a separate `ToUnicodeMap` to handle these cases, we can simply use the `fallbackToUnicode`-data (when it exists) to directly supplement missing /ToUnicode entires in the regular `ToUnicodeMap` instead.	2021-06-14 15:05:14 +02:00
Jonas Jenwald	edc38de37a	Convert `PartialEvaluator.buildToUnicode` to an `async` method This removes the need to manually wrap all return values in a Promise.	2021-06-14 15:05:14 +02:00
Jonas Jenwald	69477bfb06	Always use standard font data, with `disableFontFace` set in the API (PR 12726 follow-up) We must force-fetch standard font data, when `disableFontFace = true` is set in the API, since otherwise rendering in e.g. the viewer is still broken (same as before PR 12726 landed). Please note: We still need to also load standard font data for patterns and/or some text-rendering modes, however that will require larger changes so I figured that it cannot hurt to submit this patch right now.	2021-06-09 21:21:02 +02:00
Jonas Jenwald	a01c599247	Cache the "raw" standard font data in the worker-thread (PR 12726 follow-up) This implementation is basically a copy of the pre-existing `builtInCMapCache` implementation. For some, badly generated, PDF documents it's possible that we'll end up having to fetch the same standard font data over and over (which is obviously inefficient). While not common, it's certainly possible that a PDF document uses custom font names where the actual font then references one of the standard fonts; see e.g. issue 11399 for one such example. Note that I did suggest adding worker-thread caching of standard font data in PR 12726, however it wasn't deemed necessary at the time. Now that we have a real-world example that benefit from caching, I think that we should simply implement this now.	2021-06-09 18:27:51 +02:00
Calixte Denizet	34a2fa72c7	XFA - Add Liberation-Sans font as a substitution for some missing fonts - Some js files contain scale factors for each glyph in order to rescale Liberation to have a final font with the correct width. - A lot of XFA have some containers where their dimensions are based on their text content, so using default font from browser can lead to an almost unreadable pdf.	2021-06-09 16:55:45 +02:00
Jonas Jenwald	d995f90183	Fetch binary CMap data in the worker-thread, when `useWorkerFetch` is set This patch uses the new option added in PR 12726 to also allow fetching binary CMap data directly in the worker-thread in browsers. Given that these changes remove the need to transfer data between threads for the default (browser) use-case, we can also revert the changes in PR 11118 since that simplifies the overall implementation.	2021-06-08 21:51:07 +02:00
Jonas Jenwald	e7dc822e74	Merge pull request #12726 from brendandahl/standard-fonts [api-minor] Include and use the 14 standard font files.	2021-06-08 10:09:40 +02:00
Brendan Dahl	4c1dd47e65	Include and use the 14 standard fonts files.	2021-06-07 11:10:11 -07:00
Jonas Jenwald	eefc94ceb7	Ensure that we fully load Type3 fonts in `PartialEvaluator.getTextContent` This is necessary now, since with the previous patch the /FontBBox potentially depends on the contents of the /CharProcs-streams. Note that if `getOperatorList` is called before `getTextContent`, this patch doesn't matter since the font is already fully loaded/parsed. However, for e.g. the `text` test-cases this is necessary to ensure correct reference images.	2021-06-05 08:09:29 +02:00
Jonas Jenwald	20770cb06a	Improve text-selection for Type3 fonts with empty /FontBBox-entries (issue 6605) For Type3 fonts where the /CharProcs-streams of the individual glyph starts with a `d1` operator, we can use that to build a fallback bounding box for the font and thus improve text-selection in some cases.	2021-06-05 08:09:29 +02:00
Jonas Jenwald	e3bde56311	Ensure that the old/new `options` are correctly combined in `PartialEvaluator.clone`	2021-05-31 12:14:53 +02:00
Jonas Jenwald	c4429bc3f2	Do the `isType3Font`-check once, rather than repeating it, in `PartialEvaluator.translateFont` This is a small piece of clean-up that I happened to notice while browsing the code.	2021-05-22 11:46:37 +02:00
Jonas Jenwald	68350378c0	Handle errors gracefully, in `PartialEvaluator.buildFontPaths`, when glyph path building fails The building of glyph paths, in the `FontRendererFactory`, can fail in various ways for corrupt font data. However, we're currently not attempting to handle any such errors in the evaluator, which means that a single broken glyph can prevent an entire page from rendering. To address this we simply have to pass along, and check, the existing `ignoreErrors` option in `PartialEvaluator.buildFontPaths` similar to the rest of the `PartialEvaluator` code.	2021-05-22 11:46:31 +02:00
Jonas Jenwald	718f7bf7e1	Fix a few safe ESLint `no-var` failures in `src/core/evaluator.js` (13371 follow-up) As can be seen in PR 13371, some of the `no-var` changes in the `PartialEvaluator.{getOperatorList, getTextContent}` methods caused errors in `gulp server`-mode. However, there's a handful of instances of `var` in other methods which should be completely safe to convert since there's no strange scope-issues present in that code.	2021-05-16 15:22:43 +02:00
Jonas Jenwald	8943bcd3c3	Account for formatting changes in Prettier version `2.3.0` With the exception of one tweaked `eslint-disable` comment, in `web/generic_scripting.js`, this patch was generated automatically using `gulp lint --fix`. Please find additional information at: - https://github.com/prettier/prettier/releases/tag/2.3.0 - https://prettier.io/blog/2021/05/09/2.3.0.html	2021-05-16 11:44:05 +02:00
Jonas Jenwald	75208d36c2	Revert "Fix the remaining `no-var` failures, which couldn't be handled automatically, in the `src/core/evaluator.js` file" (PR 13344 follow-up) This reverts commit 0ef9b5aafc88094f19fec793c174c622e7e15542, since it cases a lot of warnings (see below) locally with e.g. the document from issue 9627. Strangely enough, this only occurs with `gulp server`-mode and the actual builds are apparently fine. It seems that this may be some unfortunate interaction with the old Babel-plugin that's used together with SystemJS. ``` Warning: getTextContent - ignoring ExtGState: "FormatError: ExtGState should be a dictionary.". ``` Rather than taking the risk that this could actually cover a more serious bug, and since I cannot immediately figure out what's wrong, it thus seem safest to revert this for now and we can (carefully) revisit this once SystemJS has been removed (see PR 12563).	2021-05-13 11:19:46 +02:00
Jonas Jenwald	6eef69de22	Export the "raw" `toUnicode`-data from `PartialEvaluator.preEvaluateFont` Compared to other data-structures, such as e.g. `Dict`s, we're purposely not caching Streams on the `XRef`-instance.[1] The, somewhat unfortunate, effect of Streams not being cached is that repeatedly getting the same Stream-data requires re-parsing/re-initializing of a bunch of data; see `XRef.fetch` and related methods. For the font-parsing in particular we're currently fetching the `toUnicode`-data, which is very often a Stream, in `PartialEvaluator.preEvaluateFont` and then again in `PartialEvaluator.extractDataStructures` soon afterwards. By instead letting `PartialEvaluator.preEvaluateFont` export the "raw" `toUnicode`-data, we can avoid some unnecessary re-parsing/re-initializing when handling fonts. Please note: In this particular case, given that `PartialEvaluator.preEvaluateFont` only accesses the "raw" `toUnicode` data, exporting a Stream should be safe. --- [1] The reasons for this include: - Streams, especially `DecodeStream`-instances, can become very large once read. Hence caching them really isn't a good idea simply because of the (potential) memory impact of doing so. - Attempting to read from the same Stream-instance more than once won't work, unless it's `reset` in between, since using any method such as e.g. `getBytes` always starts at the current data position. - Given that parsing, even in the worker-thread, is now fairly asynchronous it's generally impossible to assert that any one Stream-instance isn't being accessed "concurrently" by e.g. different `getOperatorList` calls. Hence `reset`-ing a cached Stream-instance isn't going to work in the general case.	2021-05-08 12:04:13 +02:00
Jonas Jenwald	13fb1654dc	Export the `firstChar`/`lastChar`-data from `PartialEvaluator.preEvaluateFont` Rather than re-fetching/re-parsing these properties immediately in `PartialEvaluator.translateFont`, we can simply export them instead. (Obviously the effect will be really tiny, but there is less parsing overall this way.)	2021-05-08 12:02:49 +02:00
Jonas Jenwald	8a1cb82aee	Ensure that the `Widths` array is parsed correctly in `PartialEvaluator.preEvaluateFont` Please note: While I don't have a document that this patches fixes, the current code is however not entirely correct as far as I can tell. Looking at how the `Widths` array is parsed in `PartialEvaluator.extractWidths`, it's clear that the implementation in `PartialEvaluator.preEvaluateFont` is a bit too simplistic. In particular, by only wrapping the data into a TypedArray, there's no attempt to handle indirect objects which could potentially lead to colliding `hash`es being computed.	2021-05-07 21:23:44 +02:00
Jonas Jenwald	30b2739adf	Ensure that composite/non-composite fonts won't get the same `hash` in `PartialEvaluator.preEvaluateFont` To hopefully help prevent any future bugs, make sure that composite/non-composite fonts cannot accidentally get matching `hash`es. Given the differences between those font types, that's very unlikely to be useful or even correct in general.	2021-05-07 21:22:37 +02:00
Jonas Jenwald	fc59a5f709	Take the `W` array into account when computing the hash, in `PartialEvaluator.preEvaluateFont`, for composite fonts (issue 13343) Without this some composite fonts may incorrectly end up with matching `hash`es, thus breaking rendering since we'll not actually try to load/parse some of the fonts. Please note: Given that the document, in the referenced issue, doesn't embed any of its fonts there's no guarantee that it renders correctly in all configurations even with this patch.	2021-05-07 21:22:36 +02:00
Jonas Jenwald	0ef9b5aafc	Fix the remaining `no-var` failures, which couldn't be handled automatically, in the `src/core/evaluator.js` file The only slight complication here were some of the `switch`-cases, in `getOperatorList`/`getTextContent`, where the parsing is done asynchronously. However, those cases are easy to deal with by wrapping the code within its own block; please see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Statements/switch#block-scope_variables_within_switch_statements	2021-05-06 10:21:05 +02:00
Jonas Jenwald	f93c3b9aa7	Enable the `no-var` rule in the `src/core/evaluator.js` file These changes were made automatically, using `gulp lint --fix`.	2021-05-06 09:39:21 +02:00
Jonas Jenwald	77b258440b	Move some constants and helper functions `from src/core/fonts.js` and into their own file - `FontFlags`, is used in both `src/core/fonts.js` and `src/core/evaluator.js`. - `getFontType`, same as the above. - `MacStandardGlyphOrdering`, is a fairly large data-structure and `src/core/fonts.js` is already a very large file. - `recoverGlyphName`, a dependency of `type1FontGlyphMapping`; please see below. - `SEAC_ANALYSIS_ENABLED`, is used by both `Type1Font`, `CFFFont`, and unit-tests; please see below. - `type1FontGlyphMapping`, is used by both `Type1Font` and `CFFFont` which a later patch will move to their own files.	2021-05-02 21:00:29 +02:00
Jonas Jenwald	6912bb5e0a	Move the `IdentityToUnicodeMap`/`ToUnicodeMap` from `src/core/fonts.js` and into its own file	2021-05-02 21:00:29 +02:00
Tim van der Meij	f6f335173d	Merge pull request #13303 from Snuffleupagus/BaseStream Add an abstract base-class, which all the various Stream implementations inherit from	2021-05-01 19:13:36 +02:00
calixteman	af4dc55019	[api-minor] Fix the way to chunk the strings (#13257 ) - Improve chunking in order to fix some bugs where the spaces aren't here: * track the last position where a glyph has been drawn; * when a new glyph (first glyph in a chunk) is added then compare its position with the last saved one and add a space or break: - there are multiple ways to move the glyphs and to avoid to have to deal with all the different possibilities it's a way easier to just compare positions; - and so there is now one function (i.e. "compareWithLastPosition") where all the job is done. - Add some breaks in order to get lines; - Remove the multiple whites spaces: * some spaces were filled with several whites spaces and so it makes harder to find some sequences of words using the search tool; * other pdf readers replace spaces by one white space. Update src/core/evaluator.js Co-authored-by: Jonas Jenwald <jonas.jenwald@gmail.com> Co-authored-by: Jonas Jenwald <jonas.jenwald@gmail.com>	2021-04-30 14:41:13 +02:00
Jonas Jenwald	30a22a168d	Move the `DecodeStream` and `StreamsSequenceStream` from `src/core/stream.js` and into its own file	2021-04-28 10:16:51 +02:00
Jonas Jenwald	da22146b95	Replace a bunch of `Array.prototype.forEach()` cases with `for...of` loops instead Using `for...of` is a modern and generally much nicer pattern, since it gets rid of unnecessary callback-functions. (In a couple of spots, a "regular" `for` loop had to be used.)	2021-04-24 13:00:19 +02:00
Jonas Jenwald	7fab73ed23	For CFF fonts without proper `ToUnicode`/`Encoding` data, utilize the "charset"/"Encoding"-data from the font file to improve text-selection (issue 13260) This patch extends the approach, implemented in PR 7550, to also apply to CFF fonts.	2021-04-20 20:48:44 +02:00
Brendan Dahl	ac3fa1e3d7	Merge pull request #13146 from calixteman/xfa_fonts XFA -- Load fonts permanently from the pdf	2021-04-16 12:55:12 -07:00
Calixte Denizet	7e9579045f	XFA -- Load fonts permanently from the pdf - Different fonts can be used in xfa and some of them are embedded in the pdf. - Load all the fonts in window.document. Update src/core/document.js Co-authored-by: Jonas Jenwald <jonas.jenwald@gmail.com> Update src/core/worker.js Co-authored-by: Jonas Jenwald <jonas.jenwald@gmail.com>	2021-04-15 17:57:42 +02:00
Jani Pehkonen	3a96977ea8	Implement visibility expressions for optional content	2021-04-14 17:39:41 +03:00
Brendan Dahl	fc9501a637	Add support for basic structure tree for accessibility. When a PDF is "marked" we now generate a separate DOM that represents the structure tree from the PDF. This DOM is inserted into the <canvas> element and allows screen readers to walk the tree and have more information about headings, images, links, etc. To link the structure tree DOM (which is empty) to the text layer aria-owns is used. This required modifying the text layer creation so that marked items are now tracked.	2021-04-09 09:56:28 -07:00
Jonas Jenwald	68d3a333ac	Change the `seenStyles` object, in `PartialEvaluator.getTextContent`, to a Set Given that what we actually want is only to keep track of the loadedFont-names, rather than storing any actual data, using an object isn't really necessary here. Furthermore, in the current code, we're also using `in` when checking if the data exists, which is generally less efficient than just checking for the value directly.	2021-04-05 10:34:02 +02:00
Jonas Jenwald	0eb1433c78	[api-minor] Change the format of the `fontName`-property, in `defaultAppearanceData`, on Annotation-instances (PR 12831 follow-up) Currently the `fontName`-property contains an actual /Name-instance, which is a problem given that its fallback value is an empty string; see `ca7f546828/src/core/default_appearance.js (L35)` The reason that this is a problem can be seen in `ca7f546828/src/core/primitives.js (L30-L34)`, since an empty string short-circuits the cache. Essentially, in PDF documents, a /Name-instance cannot be empty and the way that the `DefaultAppearanceEvaluator` does things is unfortunately not entirely correct. Hence the `fontName`-property is changed to instead contain a string, rather than a /Name-instance, which simplifies the code overall. Please note: I'm tagging this patch with "[api-minor]", since PR 12831 is included in the current pre-release (although we're not using the `fontName`-property in the display-layer).	2021-04-01 16:47:30 +02:00
Jonas Jenwald	1ee747a620	Remove unneeded `instanceof MissingDataException` checks The following checks are all unneeded, and could easily cause confusion when reading the code. (All of them are my fault as well, since I've sometimes added those checks without really thinking about the surrounding code.) - In `PartialEvaluator.hasBlendModes` there cannot be any `MissingDataException`s thrown, given that the `Page.getOperatorList` method waits for all the necessary /Resources to load first. Furthermore, note also that if an error is thrown from `PartialEvaluator.hasBlendModes` then it'd completely break rendering of that page, since any errors thrown from `Page.getOperatorList` are simply sent to the main-thread. - In `PartialEvaluator.handleColorN` there cannot be any `MissingDataException`s thrown, given that again the `Page.getOperatorList` method waits for all the necessary /Resources to load before operatorList parsing starts. - In `XRef.readXRef` there cannot be any `MissingDataException`s thrown, given that we're explicitly requesting (and waiting for) the entire document in `pdfManagerReady` (in `src/core/worker.js`) before re-parsing of a corrupt document starts.	2021-02-13 12:26:05 +01:00
Jonas Jenwald	31098c404d	Use `Math.hypot`, instead of `Math.sqrt` with manual squaring (#12973 ) When the PDF.js project started `Math.hypot` didn't exist yet, and until recently we still supported browsers (IE 11) without a native `Math.hypot` implementation; please see this compatibility information: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Math/hypot#browser_compatibility Furthermore, somewhat recently there were performance improvements of `Math.hypot` in Firefox; see https://bugzilla.mozilla.org/show_bug.cgi?id=1648820 Finally, this patch also replaces a couple of multiplications with the exponentiation operator.	2021-02-10 12:28:49 +01:00
Jonas Jenwald	e6fe8a7d53	Handle errors gracefully, in `PartialEvaluator.translateFont`, when fetching the font file (issue 9462) The third page of the referenced PDF document currently fails to render completely, since one of its font files fail to load. Since that error isn't handled, a large part of the text is thus missing which looks quite bad. By "replacing" the font data with an empty stream, we'll thus be able to fallback to rendering the text with a standard font (instead of using `ErrorFont`). While there's obviously no guarantee that things will look perfect, actually rendering the text at all should be an improvement in general. Also, print a warning in `PartialEvaluator.loadFont` when the `PartialEvaluator.translateFont` method rejects, since that'd have helped debug/fix the issue faster.	2021-02-06 19:44:53 +01:00
Tim van der Meij	e4e92d10e8	Merge pull request #12922 from Snuffleupagus/getTextContent-globalImageCache Ignore globally cached images in `PartialEvaluator.getTextContent` (PR 11930 follow-up)	2021-01-28 23:44:10 +01:00
Tim van der Meij	8805614a03	Merge pull request #12924 from brendandahl/fix-clone Fix font data clone error when pdfBug is enabled.	2021-01-28 23:42:12 +01:00
Jonas Jenwald	72da2aa166	Ignore globally cached images in `PartialEvaluator.getTextContent` (PR 11930 follow-up) Given that we'll only cache `/XObject`s of the `Image`-type globally, we can utilize that in `PartialEvaluator.getTextContent` as well. This way, in cases such as e.g. issue 12098, we can avoid having to fetch/parse `/XObject`s that we already know to be `Image`s. This is helpful, since `Stream`s are not cached on the `XRef` instance (given their potential size) and the lookup can thus be somewhat expensive in general. Also, skip a redundant `RefSetCache.has` check in the `GlobalImageCache.getData` method.	2021-01-28 10:19:26 +01:00
Brendan Dahl	52fb5abb0b	Fix font data clone error when pdfBug is enabled. The widths property should be an object to match what metrics returns. In ZapfDingbats.pdf I was getting a data clone error with pdfBug enabled. In buildCharCodeToWidth() there was an encoding with the name "at" which is also the name of a method on an array. buildCharCodeToWidth assumes an object is passed in, so when it checked for the "at" property, it found the method and copied it over. This only seemed to affect Firefox.	2021-01-27 14:38:43 -08:00
Jonas Jenwald	1ab6d2c604	Improve global image caching for small images (PR 11912 follow-up, issue 12098) When implementing the `GlobalImageCache` functionality I was mostly worried about the effect of very large images, hence the maximum number of cached images were purposely kept quite low[1]. However, there's one fairly obvious problem with that approach: In documents with hundreds, or even thousands, of small images the `GlobalImageCache` as implemented becomes essentially pointless. Hence this patch, where the `GlobalImageCache`-implementation is changed in the following ways: - We're still guaranteed to be able to cache a minimum number of images, set to `10` (similar as before). - If the total size of all the cached image data is below a threshold[2], we're allowed to cache additional images. This patch thus improve, but doesn't completely fix, issue 12098. Note that that document is created by a very poor PDF generator, since every single page contains the entire document (with all of its /Resources) and to create the individual pages clipping is used.[3] --- [1] Currently set to `10` images; imagine what would happen to overall memory usage if we encountered e.g. 50 images each 10 MB in size. [2] This value was chosen, somewhat randomly, to be `40` megabytes; basically five times the [maximum individual image size per page](`6249ef517d/src/display/api.js (L2483-L2484)`). [3] This surely has to be some kind of record w.r.t. how badly PDF generators can mess things up...	2021-01-26 12:00:12 +01:00
Jonas Jenwald	8137c0547d	Fix the `gStateObj` lookup in `TranslatedFont._removeType3ColorOperators` (PR 12718 follow-up) As can be seen in `2cba290361/src/core/evaluator.js (L986)` the `gStateObj` (which is actually an Array despite its name), is wrapped in Array when it's inserted into the OperatorList. Hence we obviously need to take this into account when accessing it in `TranslatedFont._removeType3ColorOperators`; this mistake happened because we don't have any test-cases for this particular code-path as far as I know.	2021-01-22 12:27:38 +01:00
calixteman	1039698697	Add a parser to get font data from the default appearance (#12831 ) * Add a parser to get font data from the default appearance - pdfium & poppler use a special parser too to get these info. * Update src/core/default_appearance.js Co-authored-by: Jonas Jenwald <jonas.jenwald@gmail.com> Co-authored-by: Jonas Jenwald <jonas.jenwald@gmail.com>	2021-01-21 20:15:31 +01:00

1 2 3 4 5 ...