Sakurai/pdf.js - pdf.js - Gitea on kemo

Sakurai/pdf.js

Author	SHA1	Message	Date
Yury Delendik	a02e2686b9	Merge pull request #7475 from Snuffleupagus/api-getTextContent-combineTextItems [api-minor] Add a parameter to `PDFPageProxy_getTextContent` that controls whether `PartialEvaluator_getTextContent` will attempt to combine same line text items	2016-07-27 08:34:24 -05:00
Brendan Dahl	5678486802	Merge pull request #7347 from Snuffleupagus/evaluator-more-Ref_toString Slightly refactor the `fontRef` handling in `PartialEvaluator_loadFont` (issue 7403 and issue 7402)	2016-07-22 17:21:47 -07:00
Brendan Dahl	50d6e4f147	Merge pull request #7447 from Snuffleupagus/buildToUnicode-notdef Ignore .notdef in the `differences` array when building a fallback `toUnicode` map in `PartialEvaluator_buildToUnicode` (issue 5256)	2016-07-22 14:33:32 -07:00
Jonas Jenwald	390c02a3e9	Attempt to cache fonts that are direct objects (i.e. `Dict`s), as opposed to `Ref`s, to prevent re-rendering after `cleanup` from breaking (issue 7403 and issue 7402) Fonts that are not referenced by `Ref`s are very uncommon in practice, but it can unfortunately happen. In this case, we're currently not caching them in the usual way, i.e. by `Ref`, which leads to failures when a page is rendered after `cleanup` has run. The simplest solution would have been to remove the `font.translated` workaround, but since this would have meant loading these kind of fonts over and over, the patch attempts to be a bit clever about this situation. Note that if we instead loaded fonts per page, instead of per document, this issue wouldn't have existed.	2016-07-21 16:04:07 +02:00
Jonas Jenwald	2e9cd3ea64	Slightly refactor the `fontRef` handling in `PartialEvaluator_loadFont` (issue 7403 and issue 7402) Originally, I was just going to change this code to use `Ref_toString` in a couple more places. When I started reading the code, I figured that it wouldn't hurt to clean up a couple of comments. While doing this, I noticed that the logic for the (rare) `isDict(fontRef)` case could do with a few improvements. There should be no functional changes with this patch, but given the added reference checks, we will now avoid bogus `Ref`s when resolving font aliases. In practice, as issue 7403 shows, the current code can break certain PDF files even if it's very rare. Note that the only thing that this patch will change, is the `font.loadedName` in the case where a `fontRef` is a reference and the font doesn't have a descriptor. Previously for `fontRef = Ref(4, 0)` we'd get `font.loadedName = 'g_d0_f4_0'`, and with this patch `font.loadedName = g_d0_f4R`, which is actually one character shorted in most cases. (Given that `Ref_toString` contains an optimization for the `gen === 0` case, which is by far the most common `gen` value.) In the already existing fallback case, where the `fontName` is used to when creating the `font.loadedName`, we allow any alphanumeric character. Hence I don't see how (as mentioned above) e.g. `font.loadedName = g_d0_f4R` would be an issue here.	2016-07-21 16:03:33 +02:00
Jonas Jenwald	f297e4d17c	[api-minor] Add a parameter to `PDFPageProxy_getTextContent` that controls whether `PartialEvaluator_getTextContent` will attempt to combine same line text items From the discussion in issue 7445, it seems that there may be cases where an API consumer would want to get the text content as is, without combined text items.	2016-07-19 13:38:57 +02:00
klemens	6f03f62327	trivial spelling fixes	2016-07-17 14:33:41 +02:00
Jonas Jenwald	bdd58ab1d2	Ignore .notdef in the `differences` array when building a fallback `toUnicode` map in `PartialEvaluator_buildToUnicode` (issue 5256) Fixes 5256.	2016-06-27 16:20:23 +02:00
Jonas Jenwald	b02d560ae0	Fix errors in `setGState` in `PartialEvaluator_getTextContent` that prevents text-selection from working properly Currently `setGState` is completely broken, and looking through the history of that code, it seems to me that this may never have worked correctly. This patch fixes the text-selection in `extgstate.pdf` in the test-suite, which is also added as a `text` test.	2016-06-01 22:58:49 +02:00
Jonas Jenwald	7ddb0bc718	Attempt to combine text runs positioned with `setTextMatrix`	2016-05-18 17:21:58 +02:00
Jonas Jenwald	6111c17c8a	Use `Dict_getArray` in more places in `src/core/` to avoid issues when Arrays contain indirect objects As evident from e.g. PRs 6485 and 7118, some bad PDF generators unfortunately create Arrays where some elements are indirect objects (i.e. `Ref`s). This seems to mostly affect Arrays that contain numbers, such as e.g. `Matrix/FontMatrix/BBox/FontBBox/Rect/Color/...`, and has manifested itself in PDF files that fail to render correctly (some elements are missing). The problem in both the cases above, besides broken rendering, was that there were no errors/warnings that indicated what the problem was, making it difficult to pinpoint the issue. Hence this patch, where I've audited all usages of `Dict_get` in `src/core/` files, and replaced it with `Dict_getArray` where appropriate to try and prevent unnecessary future bugs.	2016-05-05 19:42:57 +02:00
Jonas Jenwald	f59c3a0644	Remove the remaining usages of `new {Name,Cmd}` in favor of `{Name,Cmd}.get` Using `new {Name,Cmd}` should be avoided, since it creates a new object on every call, whereas `{Name,Cmd}.get` uses caches to only create one object regardless of how many times they are called. Most of these are found in the unit-tests, where increased memory usage probably doesn't matter very much. But it still seems good to get rid of those cases, since no part of the codebase ought to advertise that usage. Given the small size of the patch, I'm also tweaking a few comments and class names.	2016-04-08 12:14:05 +02:00
Yury Delendik	a250c150ab	Merge pull request #7134 from yurydelendik/circ-stream-colorspace Refactors to remove stream.js dependency on colorspace.js	2016-04-01 08:23:24 -05:00
Yury Delendik	35cbf74b12	Refactors to remove stream.js dependency on colorspace.js	2016-04-01 07:36:16 -05:00
Jonas Jenwald	05cf709f8e	Parse Type1 font files to determine the various `Length{n}` properties, instead of trusting the PDF file (issue 5686, issue 3928) Fixes 5686. Fixes 3928.	2016-03-31 11:08:12 +02:00
Brendan Dahl	4e2f70440f	Merge pull request #6711 from yurydelendik/errors Better errors capturing at the core and stop rendering on error.	2016-03-29 09:19:28 -07:00
Yury Delendik	bda5e6235e	Removes global PDFJS usage from the src/core/.	2016-03-23 19:24:37 -05:00
Manas	f6d28ca323	Refactors CMapFactory.create to make it async	2016-03-21 23:08:19 +05:30
Yury Delendik	8ba413e761	Better errors capturing at the core and stop rendering on error.	2016-03-11 07:59:09 -06:00
Jonas Jenwald	93ea866f01	Remove `getAll` from `EvaluatorPreprocessor_read` For the operators that we currently support, the arguments are not `Dict`s, which means that it's not really necessary to use `Dict_getAll` in `EvaluatorPreprocessor_read`. Also, I do think that if/when we support operators that use `Dict`s as arguments, that should be dealt with in the corresponding `case` in `PartialEvaluator_getOperatorList` which handles the operator. The only reason that I can find for using `Dict_getAll` like that, is that prior to PR 6550 we would just append certain (currently unsupported) operators without doing any further processing/checking. But as issue 6549 showed, that can lead to issues in practice, which is why it was changed. In an effort to prevent possible issue with unsupported operators, this patch simply ignores operators with `Dict` arguments in `PartialEvaluator_getOperatorList`.	2016-02-12 22:31:50 +01:00
Jonas Jenwald	f7f60197ce	Replace `getAll` with `getKeys` in `loadType3Data` Not only is `getAll` less efficient, but given that we actually need the keys here, using `getKeys` seems much more suitable.	2016-02-10 20:19:14 +01:00
Jonas Jenwald	07e1ad40a2	Replace `getAll` with `getKeys` in `PartialEvaluator_hasBlendModes` to speed up loading of badly generated PDF files (issue 6961) Some bad PDF generators, in particular "Scribus PDF", duplicates resources a lot at various levels of the PDF files. This can lead to `PartialEvaluator_hasBlendModes` taking an unreasonable amount of time to complete. The reason is that the current code is using `Dict_getAll`, which recursively dereferences all indirect objects, which can be really slow. This patch instead uses `Dict_getKeys`, and then manually looks up only the necessary indirect objects. I've added the PDF file as a `load` test. The most important thing here is probably to ensure that the file remains available in the repo, and the comment should help reduced the chance of regressions. (Note that locally, the `load` test times out without this patch, but we cannot really assume that that always happens.) Fixes 6961.	2016-02-10 17:21:38 +01:00
Jonas Jenwald	a1fe2cb443	Don't directly access the private `map` in `setGState`, and ensure that we avoid indirect objects This patch is based on something I noticed while debugging some of the PDF files in issue 6931. In a number of the cases in `setGState`, we're implicitly assuming that we're not dealing with indirect objects (i.e. `Ref`s). See e.g. the 'Font' case, or the various cases where we simply do `gStateObj.push([key, value]);` (since the code in `canvas.js` won't be able to deal with a `Ref` for those cases). The reason that I didn't use `Dict_forEach` instead, is that it would re-introduce the unncessary closures that PR 5205 removed.	2016-02-03 17:13:42 +01:00
Jonas Jenwald	2d4a1aa0af	Actually ignore no-op `setGState` (PR 5192 followup) The intention of PR 5192 was to avoid adding empty `setGState` ops to the operatorList. But the patch accidentally used `>=`, which means that it's not actually working as intended, since empty arrays always have `length === 0`.	2016-02-03 17:13:02 +01:00
Jonas Jenwald	4770b516fe	Correct the upper bound used when building the `transferMap` for SMasks (PR 6723 followup) Even though the currently known test-cases render correctly without this patch, that seems more like a lucky coincidence, given that there's no guarantee that `transferMap[255] === 0` for every possible transfer function.	2016-02-03 13:41:10 +01:00
Jonas Jenwald	992472fd38	Ensure that we don't modify the `Dict` data when the `Differences` array of a font contains indirect objects This patch fixes an issue that I inadvertently introduced in PR 5815, where we accidentally modify the `Differences` array in the encoding dictionary for indirect objects. Instead of this change, we could also have used the now existing `Dict_getArray`. However in this case I don't think that would have been a good idea, since it would mean iterating through the array twice.	2016-01-30 13:31:24 +01:00
Yury Delendik	2edf2792dc	Replaces literal {} created lookup tables with Object.create	2016-01-28 12:18:38 -06:00
Yury Delendik	d6adf84159	Lazify OP_MAP.	2016-01-28 12:18:37 -06:00
Yury Delendik	1de90454b7	Lazify Metrics	2016-01-28 12:11:46 -06:00
Yury Delendik	55a201d92d	Lazify NormalizedUnicodes	2016-01-28 11:56:42 -06:00
Yury Delendik	d0738d7e24	Lazify stdFontMap, serifFonts, GlyphMapForStandardFonts	2016-01-28 11:51:54 -06:00
Yury Delendik	1a9a665adf	Refactor Encodings	2016-01-28 11:32:59 -06:00
Yury Delendik	6b60c8f4db	Adds UMD headers to core, display and shared files.	2015-12-15 13:24:39 -06:00
Yury Delendik	15c9969abe	Adds transfer function support for SMask.	2015-12-04 12:52:45 -06:00
Yury Delendik	c9cb6a3025	Replaces UnsupportedManager with callback.	2015-11-30 14:42:47 -06:00
Yury Delendik	e4e69e2f05	Set error font for Type3 if its loading failed.	2015-11-27 13:05:51 -06:00
Jonas Jenwald	6dfe53b976	[api-minor] Add a parameter to `PDFPageProxy_getTextContent` that enables replacing of all whitespace with standard spaces in the textLayer (issue 6612) This patch goes a bit further than issue 6612 requires, and replaces all kinds of whitespace with standard spaces. When testing this locally, it actually seemed to slightly improve two existing test-cases (`tracemonkey-text` and `taro-text`). Fixes 6612.	2015-11-25 17:28:40 +01:00
Yury Delendik	06c1904675	Refactors FontLoader to group fonts per document.	2015-11-24 13:27:22 -06:00
Manas	a2ba1b8189	Uses editorconfig to maintain consistent coding styles Removes the following as they unnecessary /* -- Mode: Java; tab-width: 2; indent-tabs-mode: nil; c-basic-offset: 2 -- / / vim: set shiftwidth=2 tabstop=2 autoindent cindent expandtab: */	2015-11-14 07:32:18 +05:30
Yury Delendik	fa423cfab0	Refactors fake space heuristics for speed.	2015-11-06 10:55:43 -06:00
Yury Delendik	376f8bde14	Combines standalone divs into text groups.	2015-11-06 10:20:49 -06:00
Yury Delendik	fa46b73c47	Better spacing in text layer.	2015-11-02 08:54:15 -06:00
Jonas Jenwald	1c66d4a106	Add a `totalLength` getter to `OperatorList`, since the `length` is zero after flushing In the `RenderPageRequest` handler in `worker.js`, we attempt to print an `info` message containing the rendering time and the length of the operator list. The latter is currently broken (and has been for quite some time), since the `length` of an `OperatorList` is reset when flushing occurs. This patch attempts to rectify this, by adding a getter which keeps track of the total length.	2015-10-26 18:12:14 +01:00
Yury Delendik	58c3ea0820	Adds thread abort capabilities.	2015-10-23 09:06:32 -05:00
Jonas Jenwald	2e751199fb	Prevent getOperatorList from failing to correctly parse OPS.paintXObject for TilingPatterns that are missing some /Resources entries (issue 6541) Fixes 6541.	2015-10-21 21:30:56 +02:00
Rob Wu	50ff2d4c2a	Ignore operators that are known to be unsupported `operatorList.addOp` adds the arguments to the list which is then passed as-is by postMessage to the main thread. But since we don't parse these operations, they are raw PDF objects and may therefore cause a serialization error. This is a conservative patch, and only affects operators which are known to be unsupported. We should ignore all unknown operators, but I haven't really looked into the consequences of doing that. Fixes #6549	2015-10-21 15:39:25 +02:00
Brendan Dahl	3eaeacfe19	Merge pull request #6476 from Snuffleupagus/PartialEvaluator_readToUnicode-cmap-length Right-size the `map` array in PartialEvaluator_readToUnicode	2015-10-09 10:31:28 -07:00
Jonas Jenwald	1b8cb52555	Prevent `PartialEvaluator_buildFormXObject` from failing if the `Matrix` or `BBox` contains indirect objects This patch fixes yet another instance of bad PDF data, specifically a case where the `BBox` array contains indirect objects (i.e. `Ref`s). Fixes the missing image in http://www.int.washington.edu/talks/WorkShops/int_08_37W/People/Franz_M/Franz.pdf#page=24. Note: There are missing images on a number of the pages in that file.	2015-09-29 10:11:49 +02:00
Jonas Jenwald	8d831449ab	Right-size the `map` array in PartialEvaluator_readToUnicode We can avoid a lot of intermediate resizings, by directly allocating the required number of elements for the `map` array.	2015-09-24 13:08:53 +02:00
Fabian Lange	2564827503	Fix text spacing with vertical fonts (#6387 ) According to the PDF spec 5.3.2, a positive value means in horizontal, that the next glyph is further to the left (so narrower), and in vertical that it is further down (so wider). This change fixes the way PDF.js has interpreted the value.	2015-09-15 09:28:45 +02:00

1 2 3 4