pdf.js

Author	SHA1	Message	Date
Jonas Jenwald	bd16c363ce	Access the `Catalog` data correctly in the "GetPageIndex" handler in `src/core/worker.js` Even though the code obviously works as-is, given that we have unit-tests for it, it still feels incorrect to just assume that the `Catalog`-instance has all of its properties immediately available. Especially when (almost) all of the other handlers, in `src/core/worker.js`, protect their data accesses with appropriate `pdfManager.ensure` calls.	2020-08-25 12:14:14 +02:00
Jonas Jenwald	2e6e2c3b41	Access the `XRef` data correctly in the "GetStats" handler in `src/core/worker.js` Even though the code obviously works as-is, given that we have unit-tests for it, it still feels incorrect to just assume that the `XRef`-instance has all of its properties immediately available. Especially when (almost) all of the other handlers, in `src/core/worker.js`, protect their data accesses with appropriate `pdfManager.ensure` calls.	2020-08-25 12:14:11 +02:00
Jani Pehkonen	e7febbf0f7	Accent positioning in Type1 `seac` glyphs In `display/canvas.js` the accent offsets must be multiplied by `fontSize` to make the offsets large enough. Another problem is in `core/type1_parser.js` when the Type1 command `seac` is handled. There is an error in the Adobe Type1 spec. See chapter 6 in Type1 Font Format Supplement, which provides an errata: The arguments of `seac` specify the offset of the left side bearing (LSB) points, not the offset of origins. This can be fixed in `core/type1_parser.js` by adding the difference of the LSB values.	2020-08-23 21:01:25 +03:00
Tim van der Meij	a8efc0296b	Obtain the export values for choice widgets from the normal appearance The down appearance (`D`) is optional and not available in the document from #12233, so the checkboxes are never saved/printed as checked because the checked appearance is based on the export value that is missing because the `D` entry is not available. Instead, we should use the normal appearance (`N`) since that one is required and therefore always available. Finally, the /Off appearance is optional according to section 12.7.4.2.3 of the specification, so that needs to be taken into account to match the specification and to fix reference test failures for the `annotation-button-widget-print` test. That is a file that doesn't specify an /Off appearance in the normal appearance dictionary.	2020-08-23 13:00:02 +02:00
Tim van der Meij	1b82ad8fff	Decode widget form values consistently The helper method `_decodeFormValue` is used to ensure that it happens in one place. Note that form values are field values, display values and export values.	2020-08-23 13:00:01 +02:00
Tim van der Meij	12c20772ac	Improve the field value parsing for choice widgets to handle `null` values The specification states that the field value is `null` if no item is selected and we didn't handle this case properly. Even though this did not break the rendering because we always convert the value to an array and the `includes` check in the display layer would simply not match, the field value would be `[null]` which is not expected and strange from an API perspective. This commit fixes that by ensuring that we return an empty array in case the field value is `null`. The API therefore still always gives an array for the field value, but now the code is more specific so that the value is either an empty array or an array of strings.	2020-08-19 23:27:50 +02:00
Jonas Jenwald	1058f16605	Add (basic) support for transfer functions to Images (issue 6931, bug 1149713) This is similar to the existing transfer function support for SMasks, but extended to simple image data. Please note that the extra amount of data now being sent to the worker-thread, for affected /ExtGState entries, is limited to at most 4 `Uint8Array`s each with a length of 256 elements. Refer to https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G9.1658137 for additional details.	2020-08-17 10:34:12 +02:00
Jonas Jenwald	9d3e046a4f	Don't cache /ExtGState entries that contain fonts (PR 12087 follow-up) I completely overlooked the fact that `PartialEvaluator.handleSetFont` also updates the current `state`, which means that currently we're not actually handling font data correctly for cached /ExtGState data. (Thankfully, using /ExtGState to set a font is somewhat rare in practice.)	2020-08-17 08:17:25 +02:00
Calixte Denizet	1a6816ba98	Add support for saving forms	2020-08-12 10:32:59 +02:00
Brendan Dahl	7fb01f9f2a	Merge pull request #12186 from brendandahl/loca-2 Fix bad truetype loca tables.	2020-08-10 20:34:19 -07:00
Brendan Dahl	f6dff81223	Fix bad truetype loca tables. Some fonts have loca tables that aren't sorted or use 0 as an offset to signal a missing glyph. This fixes the bad loca tables by sorting them and then rewriting the loca table and potentially re-ordering the glyf table to match. Fixes #11131 and bug 1650302.	2020-08-10 14:15:49 -07:00
Calixte Denizet	88b112ab0c	Support comb textfields for printing	2020-08-09 14:41:26 +02:00
Calixte Denizet	cd8bb7293b	Support multiline textfields for printing	2020-08-09 12:14:34 +02:00
Calixte Denizet	1747d259f9	Support textfield and choice widgets for printing	2020-08-06 14:45:23 +02:00
Brendan Dahl	ac494a2278	Add support for optional marked content. Add a new method to the API to get the optional content configuration. Add a new render task param that accepts the above configuration. For now, the optional content is not controllable by the user in the viewer, but renders with the default configuration in the PDF. All of the test files added exhibit different uses of optional content. Fixes #269. Fix test to work with optional content. - Change the stopAtErrors test to ensure the operator list has something, instead of asserting the exact number of operators.	2020-08-04 09:26:55 -07:00
Tim van der Meij	5a66c56eca	Merge pull request #12108 from calixteman/radio Add support for radios printing	2020-08-02 14:47:46 +02:00
Jonas Jenwald	6d192f987e	Prevent `Uncaught (in promise) AbortException` when running the unit-tests These errors can/will occur if data is still loading when the document is destroyed, which is the case in the API unit-tests that load the `tracemonkey.pdf` file. While this patch prevents these kind of problems, and thus allows us to update Jasmine again, I cannot help but thinking that it's slightly "hacky". Basically, we'll simply catch and ignore (some) rejected promises once the document is destroyed and/or its data loading is aborted. However, I don't think that these changes should cause issues in general, since we don't really care about errors once document destruction has started (note e.g. the fair number of `catch` handlers ignoring `AbortException`s already).	2020-07-31 23:29:05 +02:00
Calixte Denizet	538017f7a7	Add support for radios printing	2020-07-31 14:31:49 +02:00
Tim van der Meij	eb4d6a0652	Merge pull request #12107 from calixteman/checkbox Add support for checkboxes printing	2020-07-30 00:11:41 +02:00
Calixte Denizet	cb60523a15	Add support for checkboxes printing	2020-07-29 16:42:57 +02:00
Jonas Jenwald	fbe90b63ec	[src/core/worker.js] Remove a useless Promise handler from the `pdfManagerReady` function Looking carefully at this code, you'll notice that the `loadDocument` function has no less than three Promise handling functions. This obviously makes no sense, since a Promise can only have one resolve and one reject handler. Hence the final `onFailure`-case is unreachable, which only serves to add confusion when reading the code. Note that this code has been re-factored more than once over the years, but it seems as if this may even have been incorrect already in PR 3310 (and no-one have noticed for seven years :-).	2020-07-28 14:51:50 +02:00
Jonas Jenwald	835b5ffddd	Only check `isType3Font` the first time that `TranslatedFont.loadType3Data` is called If the `TranslatedFont.type3Loaded` property exists, then you already know that the font must be a Type3 one.	2020-07-27 13:20:15 +02:00
Jonas Jenwald	f3ff526019	Send/receive Type3 images the same way as other globally-cached images There's quite frankly no particular reason to special-case Type3-fonts with image resources, which are very rare anyway, now that we have a general mechanism for sending/receiving images globally.	2020-07-27 13:20:15 +02:00
Jonas Jenwald	7c9d0d5939	Improve how Type3-fonts with dependencies are handled While the `CharProcs` streams of Type3-fonts usually don't rely on dependencies, such as e.g. images, it does happen in some cases. Currently any dependencies are simply appended to the parent operatorList, which in practice means only the operatorList of the first page where the Type3-font is being used. However, there's one thing that's slightly unfortunate with that approach: Since fonts are global to the PDF document, we really ought to ensure that any Type3 dependencies are appended to the operatorList of all pages where the Type3-font is being used. Otherwise there's a theoretical risk that, if one page has its rendering paused, another page may try to use a Type3-font whose dependencies are not yet fully resolved. In that case there would be errors, since Type3 operatorLists are executed synchronously. Hence this patch, which ensures that all relevant pages will have Type3 dependencies appended to the main operatorList. (Note here that the `OperatorList.addDependencies` method, via `OperatorList.addDependency`, ensures that a dependency is only added once to any operatorList.) Finally, these changes also remove the need for the "waiting for the main-thread"-hack that was added to `PartialEvaluator.buildPaintImageXObject` as part of fixing issue 10717.	2020-07-27 13:20:13 +02:00
Calixte Denizet	584902dbf8	Add an annotation storage in order to save annotation data in acroforms	2020-07-24 10:50:11 +02:00
Jonas Jenwald	684a7b89ac	Remove unnecessary duplication in the `addChildren` helper function (used by the `ObjectLoader`) Besides being fewer lines of code overall, this also avoids one `node instanceof Dict` check for both of the `Dict`/`Stream`-cases.	2020-07-17 16:32:24 +02:00
Jonas Jenwald	ea8e432c45	Add a `getRawValues` method, to `Dict` instances, to provide an easier way of getting all raw values When the old `Dict.getAll()` method was removed, it was replaced with a `Dict.getKeys()` call and `Dict.get(...)` calls (in a loop). While this pattern obviously makes a lot of sense in many cases, there's some instances where we actually want the raw `Dict` values (i.e. `Ref`s where applicable). In those cases, `Dict.getRaw(...)` calls are instead used within the loop. However, by introducing a new `Dict.getRawValues()` method we can reduce the number of (strictly unnecessary) function calls by simply getting the raw `Dict` values directly.	2020-07-17 16:32:00 +02:00
Jonas Jenwald	6381b5b08f	Add a `size` getter, to `Dict` instances, to provide an easier way of checking the number of entries This removes the need to manually call `Dict.getKeys()` and check its length.	2020-07-17 16:06:11 +02:00
Tim van der Meij	e63d1ebff5	Merge pull request #12087 from Snuffleupagus/LocalGStateCache Add local caching of "simple" Graphics State (ExtGState) data in `PartialEvaluator.{getOperatorList, getTextContent}` (issue 2813)	2020-07-17 16:02:45 +02:00
Tim van der Meij	b19a1796ac	Convert `RefSetCache` to a proper class and to use a `Map` internally Using a `Map` instead of an `Object` provides some advantages such as cheaper ways to get the size of the cache, to find out if an entry is contained in the cache and to iterate over the cache. Moreover, we can clear and re-use the same `Map` object now instead of creating a new one.	2020-07-17 13:35:29 +02:00
Jonas Jenwald	b3480842b3	Use a `RefSet`, rather than a plain Object, for tracking already processed nodes in `PartialEvaluator.hasBlendModes`	2020-07-17 09:52:36 +02:00
Jonas Jenwald	03547b5633	Change `PartialEvaluator.setGState` to an `async` method Since this method calls `Dict.get` to fetch data, there could thus be `Error`s thrown in corrupt PDF documents when attempting to resolve an indirect object. To ensure that this won't ever become a problem, we change the method to be `async` such that a rejected Promise would be returned and general OperatorList parsing won't break.	2020-07-15 14:27:18 +02:00
Jonas Jenwald	f20aeb9343	Slightly simplify the code in `PartialEvaluator.hasBlendModes`, e.g. by using `for...of` loops - Replace the existing loops with `for...of` variants instead. - Make use of `continue`, to reduce indentation and to make the code (slightly) easier to follow, when checking `/Resources` entries.	2020-07-15 12:47:11 +02:00
Jonas Jenwald	15fa3f8518	Remove a redundant `/XObject` stream dictionary `objId` check in `PartialEvaluator.hasBlendModes` (PR 6971 follow-up) This case should no longer happen, given the `instanceof Ref` branch just above (added in PR 6971). Also, I've run the entire test-suite locally with `continue` replaced by `throw new Error(...)` and didn't find any problems.	2020-07-15 12:47:11 +02:00
Jonas Jenwald	84476da26e	Handle lookup errors "silently" in `PartialEvaluator.hasBlendModes` (PR 11680 follow-up) Given that this method is used during what's essentially a pre-parsing stage, before the actual OperatorList parsing occurs, on second thought it doesn't seem at all necessary to warn and trigger fallback in cases where there's lookup errors. Please note: Any any errors will still be either suppressed or thrown, according to the `ignoreErrors` option, during the actual OperatorList parsing.	2020-07-15 12:47:07 +02:00
Jonas Jenwald	981ff41b5f	Add local caching of non-font Graphics State (ExtGState) data in `PartialEvaluator.getTextContent` It turns out that `getTextContent` suffers from similar problems with repeated GStates as `getOperatorList`; please see the previous patch. While only `/ExtGState` resources containing Fonts will actually be parsed by `PartialEvaluator.getTextContent`, we're still forced to fetch/validate repeated `/ExtGState` resources even though most of them won't affect the textContent (since they mostly contain purely graphical state). With these changes we also no longer need to immediately reset the current text-state when encountering a `setGState` operator, which may thus improve text-selection in some cases.	2020-07-14 10:34:43 +02:00
Jonas Jenwald	90eb579713	Add local caching of "simple" Graphics State (ExtGState) data in `PartialEvaluator.getOperatorList` (issue 2813) This patch will help pathological cases the most, with issue 2813 being a particularily problematic example. While there's only four `/ExtGState` resources, there's a total `29062` of `setGState` operators. Even though parsing of a single `/ExtGState` resource is quite fast, having to re-parse them thousands of times does add up quite significantly. For simplicity we'll only cache "simple" `/ExtGState` resource, since e.g. the general `SMask` case cannot be easily cached (without re-factoring other code, which may have undesirable effects on general parsing). By caching "simple" `/ExtGState` resource, we thus improve performance by: - Not having to fetch/validate/parse the same `/ExtGState` data over and over. - Handling of repeated `setGState` operators becomes synchronous during the `OperatorList` building, instead of having to defer to the event-loop/microtask-queue since the `/ExtGState` parsing is done asynchronously. --- Obviously I had intended to include (standard) benchmark results with this patch, but for reasons I don't understand the test run-time (even with `master`) of the document in issue 2813 is a lot slower than in the development viewer (making normal benchmarking infeasible). However, testing this manually in the development viewer (using `pdfBug=Stats`) shows a reduction of `~10 %` in the rendering time of the PDF document in issue 2813.	2020-07-14 10:34:43 +02:00
Jonas Jenwald	d4d7ac1b88	Stop special-casing the (very unlikely) "no `/XObject` found"-scenario, when parsing `OPS.paintXObject` operators, in `PartialEvaluator.{getOperatorList, getTextContent}` Originally there weren't any (generally) good ways to handle errors gracefully, on the worker-side, however that's no longer the case and we can simply fallback to the existing `ignoreErrors` functionality instead. Also, please note that the "no `/XObject` found"-scenario should be extremely unlikely in practice and would only occur in corrupt/broken documents. Note that the `PartialEvaluator.getOperatorList` case is especially bad currently, since we'll simply (attempt to) send the data as-is to the main-thread. This is quite bad, since in a corrupt/broken document the data could contain anything and e.g. be unclonable (which would cause breaking errors). Also, we're (obviously) not attempting to do anything with this "raw" `OPS.paintXObject` data on the main-thread and simply ensuring that we never send it definately seems like the correct approach.	2020-07-12 21:59:59 +02:00
Tim van der Meij	7dabc5ecc8	Merge pull request #12063 from Snuffleupagus/issue-10989 Tweak the heuristic, in `src/core/jpg.js`, that handles JPEG images with a wildly incorrect SOF (Start of Frame) `scanLines` parameter (issue 10989)	2020-07-11 00:05:11 +02:00
Jonas Jenwald	d18cf47419	Remove the special handling, used when creating Indexed ColorSpaces, for the case where the `lookup`-data is a `Stream` This special-case was added in PR 1992, however it became unnecessary with the changes in PR 4824 since all of the ColorSpace parsing is now done on the worker-thread (with only RGB-data being sent to the main-thread).	2020-07-10 17:22:55 +02:00
Jonas Jenwald	ea6a0e4435	Remove the IR (internal representation) part of the ColorSpace parsing Originally ColorSpaces were only partially parsed on the worker-thread, to obtain an IR-format which was sent to the main-thread. This had the somewhat unfortunate side-effect of causing the majority of the (potentially heavy) ColorSpace parsing to happen on the main-thread. Hence PR 4824 which, among other things, changed ColorSpaces to be fully parsed on the worker-thread with only RGB-data being sent to the main-thread. While it thus originally was necessary to have `ColorSpace.{parseToIR, fromIR}` methods, to handle the worker/main-thread split, that's no longer the case and we can thus reduce all of the ColorSpace parsing to one method instead. Currently, when parsing a ColorSpace, we call `ColorSpace.parseToIR` which parses the ColorSpace-data from the document and then creates the IR-format. We then, immediately, call `ColorSpace.fromIR` which parses the IR-format and then finally creates the actual ColorSpace.[1] All-in-all, this leads to a fair amount of unnecessary indirection which also (in my opinion) makes the code less clear. Obviously these changes are not really expected to have a significant effect on performance, especially with the recently added caching of ColorSpaces, however there'll now be strictly fewer function calls, less memory allocated, and overall less parsing required during ColorSpace-handling. --- [1] For ICCBased ColorSpaces, given the validation necessary, this currently even leads to parsing an /Alternate ColorSpace twice.	2020-07-10 17:22:44 +02:00
Jonas Jenwald	4cc6797f17	Re-factor the `idFactory` functionality, used in the `core/`-code, and move the `fontID` generation into it Note how the `getFontID`-method in `src/core/fonts.js` is completely global, rather than properly tied to the current document. This means that if you repeatedly open and parse/render, and then close, even the same PDF document the `fontID`s will still be incremented continuously. For comparison the `createObjId` method, on `idFactory`, will always create a consistent id, assuming of course that the document and its pages are parsed/rendered in the same order. In order to address this inconsistency, it thus seems reasonable to add a new `createFontId` method on the `idFactory` and use that when obtaining `fontID`s. (When the current `getFontID` method was added the `idFactory` didn't actually exist yet, which explains why the code looks the way it does.) Please note: Since the document id is (still) part of the `loadedName`, it's thus not possible for different documents to have identical font names.	2020-07-07 16:33:31 +02:00
Jonas Jenwald	1d66fce781	Tweak the heuristic, in `src/core/jpg.js`, that handles JPEG images with a wildly incorrect SOF (Start of Frame) `scanLines` parameter (issue 10989)	2020-07-06 13:06:49 +02:00
Jonas Jenwald	c95fbb6e21	Convert the code in `src/core/evaluator.js` to use standard classes This removes additional `// eslint-disable-next-line no-shadow` usage, which our old pseudo-classes necessitated. Most of the re-formatting changes, after the `class` definitions and methods were fixed, were done automatically by Prettier. Please note: I'm purposely not doing any `var` to `let`/`const` conversion here, since it's generally better to (if possible) do that automatically on e.g. a directory basis instead.	2020-07-05 16:01:04 +02:00
Jonas Jenwald	32a0b6fa73	Move some constants and helper functions out of the `PartialEvaluator` closure This will simplify the `class` conversion in the next patch, and with modern JavaScript the moved code is still limited to the current module scope. Please note: For improved consistency with our usual formatting, the `TILING_PATTERN`/`SHADING_PATTERN` constants where re-factored slightly.	2020-07-05 15:56:23 +02:00
Tim van der Meij	c4255fdbfd	Merge pull request #12059 from Snuffleupagus/image-class Convert the code in `src/core/image.js` to use ES6 classes	2020-07-05 14:08:55 +02:00
Jonas Jenwald	59da1d5829	Convert the code in `src/core/image.js` to use ES6 classes This removes additional `// eslint-disable-next-line no-shadow` usage, which our old pseudo-classes necessitated. Please note: I'm purposely not doing any `var` to `let`/`const` conversion here, since it's generally better to (if possible) do that automatically on e.g. a directory basis instead.	2020-07-05 09:34:14 +02:00
Jonas Jenwald	85ced3fbfd	Allow `BaseLocalCache` to, optionally, only allocate storage for caching of references (PR 12034 follow-up) Yet another instalment in the never-ending series of things that you think of __after__ a patch has landed. Since `Function`s are only cached by reference, we thus don't need to allocate storage for names in `LocalFunctionCache` instances. Obviously the effect of these changes are really tiny, but it seems reasonable in principle to avoid allocating data structures that are guaranteed to be unused.	2020-07-04 15:01:32 +02:00
Jonas Jenwald	ca719ecaa4	Add local caching of `Function`s, by reference, in the `PDFFunctionFactory` (issue 2541) Note that compared other structures, such as e.g. Images and ColorSpaces, `Function`s are not referred to by name, which however does bring the advantage of being able to share the cache for an entire page. Furthermore, similar to ColorSpaces, the parsing of individual `Function`s are generally fast enough to not really warrant trying to cache them in any "smarter" way than by reference. (Hence trying to do caching similar to e.g. Fonts would most likely be a losing proposition, given the amount of data lookup/parsing that'd be required.) Originally I tried implementing this similar to e.g. the recently added ColorSpace caching (and in a couple of different ways), however it unfortunately turned out to be quite ugly/unwieldy given the sheer number of functions/methods where you'd thus need to pass in a `LocalFunctionCache` instance. (Also, the affected functions/methods didn't exactly have short signatures as-is.) After going back and forth on this for a while it seemed to me that the simplest, or least "invasive" if you will, solution would be if each `PartialEvaluator` instance had its own `PDFFunctionFactory` instance (since the latter is already passed to all of the required code). This way each `PDFFunctionFactory` instances could have a local `Function` cache, without it being necessary to provide a `LocalFunctionCache` instance manually at every `PDFFunctionFactory.{create, createFromArray}` call-site. Obviously, with this patch, there's now (potentially) more `PDFFunctionFactory` instances than before when the entire document shared just one. However, each such instance is really quite small and it's also tied to a `PartialEvaluator` instance and those are not kept alive and/or cached. To reduce the impact of these changes, I've tried to make as many of these structures as possible lazily initialized, specifically: - The `PDFFunctionFactory`, on `PartialEvaluator` instances, since not all kinds of general parsing actually requires it. For example: `getTextContent` calls won't cause any `Function` to be parsed, and even some `getOperatorList` calls won't trigger `Function` parsing (if a page contains e.g. no Patterns or "complex" ColorSpaces). - The `LocalFunctionCache`, on `PDFFunctionFactory` instances, since only certain parsing requires it. Generally speaking, only e.g. Patterns, "complex" ColorSpaces, and/or (some) SoftMasks will trigger any `Function` parsing. To put these changes into perspective, when loading/rendering all (14) pages of the default `tracemonkey.pdf` file there's now a total of 6 `PDFFunctionFactory` and 1 `LocalFunctionCache` instances created thanks to the lazy initialization. (If you instead would keep the document-"global" `PDFFunctionFactory` instance and pass around `LocalFunctionCache` instances everywhere, the numbers for the `tracemonkey.pdf` file would be instead be something like 1 `PDFFunctionFactory` and 6 `LocalFunctionCache` instances.) All-in-all, I thus don't think that the `PDFFunctionFactory` changes should be generally problematic. With these changes, we can also modify (some) call-sites to pass in a `Reference` rather than the actual `Function` data. This is nice since `Function`s can also be `Streams`, which are not cached on the `XRef` instance (given their potential size), and this way we can avoid unnecessary lookups and thus save some additional time/resources. Obviously I had intended to include (standard) benchmark results with these changes, but for reasons I don't really understand the test run-time (even with `master`) of the document in issue 2541 is quite a bit slower than in the development viewer. However, logging the time it takes for the relevant `PDFFunctionFactory`/`PDFFunction ` parsing shows that it takes approximately `0.5 ms` for the `Function` in question. Looking up a cached `Function`, on the other hand, is one order of magnitude faster which does add up when the same `Function` is invoked close to 2000 times.	2020-07-04 00:55:18 +02:00
Jonas Jenwald	28d2ada59c	Attempt to detect inline images which contain "EI" sequence in the actual image data (issue 11124) This should reduce the possibility of accidentally truncating some inline images, while not causing the "EI" detection to become significantly slower.[1] There's obviously a possibility that these added checks are not sufficient to catch every single case of "EI" sequences within the actual inline image data, but without specific test-cases I decided against over-engineering the solution here. Please note: The interpolation issues are somewhat orthogonal to the main issue here, which is the truncated image, and it's already tracked elsewhere. --- [1] I've looked at the issue a few times, and this is the first approach that I was able to come up with that didn't cause unacceptable performance regressions in e.g. issue 2618.	2020-06-26 13:15:06 +02:00

1 2 3 4 5 ...

1753 Commits