pdf.js

Author	SHA1	Message	Date
Brendan Dahl	fcc9943d04	Use charstring as plain text when lengthIV is -1. Fixes #7769	2017-10-18 14:19:59 -07:00
Jonas Jenwald	b1472cddbb	Allow `getOperatorList`/`getTextContent` to skip errors when parsing broken XObjects (issue 8702, issue 8704) This patch makes use of the existing `ignoreErrors` property in `src/core/evaluator.js`, see PRs 8240 and 8441, thus allowing us to attempt to recovery as much as possible of a page even when it contains broken XObjects. Fixes 8702. Fixes 8704.	2017-09-29 17:14:21 +02:00
Jonas Jenwald	b8ec518a1e	Split the existing `PDFFunction` in two classes, a private `PDFFunction` and a public `PDFFunctionFactory, and utilize the latter in` PDFDocument `to allow various code to access the methods of` PDFFunction` Follow-up to PR 8909. This requires us to pass around `pdfFunctionFactory` to quite a lot of existing code, however I don't see another way of handling this while still guaranteeing that we can access `PDFFunction` as freely as in the old code. Please note that the patch passes all tests locally (unit, font, reference), and I very much hope that we have sufficient test-coverage for the code in question to catch any typos/mistakes in the re-factoring.	2017-09-29 15:30:53 +02:00
Jonas Jenwald	5c961c76bb	Remove the unused `inline` parameter from various methods/functions in `PDFImage`, and change a couple of methods to use Objects rather than plain parameters The `inline` parameter is passed to a number of methods/functions in `PDFImage`, despite not actually being used. Its value is never checked, nor is it ever assigned to the current `PDFImage` instance (i.e. no `this.inline = inline` exists). Looking briefly at the history of this code, I was also unable to find a point in time where `inline` was being used. As far as I'm concerned, `inline` does nothing more than add clutter to already very unwieldy method/function signatures, hence why I'm proposing that we just remove it. To further simplify call-sites using `PDFImage`/`NativeImageDecoder`, a number of methods/functions are changed to take Objects rather than a bunch of (somewhat) randomly ordered parameters.	2017-09-29 15:30:40 +02:00
Jonas Jenwald	7d3efe43a2	Ensure that the same exact version of PDF.js is used in both the API and the Worker I don't have a good example at hand right know, but I recall seeing custom deployments of PDF.js that bundle a specific version of the `build/pdf.js` file and then set `PDFJS.workerSrc` to point to https://mozilla.github.io/pdf.js/build/pdf.worker.js. That practice seems really bad since, besides (obviously) causing unnecessary server load, it will very quickly result in a version mismatch between the `pdf.js` and `pdf.worker.js` files in those PDF.js deployments. Such a version mismatch could easily lead to either breaking errors, or even worse slightly inconsistent behaviour for an API call (if the API -> Worker interface changes, which does happen from time to time). To avoid the problems described above, I'm thus proposing that we enforce that the versions of the `pdf.js` and `pdf.worker.js` files must always match.	2017-09-27 15:41:57 +02:00
Brendan Dahl	18e2321845	Overwrite maxSizeOfInstructions in maxp with computed value. In issue #7507 the value is less than the actuall max size of the glyph instructions causing OTS to fail the font.	2017-09-25 17:53:26 -07:00
Jonas Jenwald	10727572a2	Merge pull request #8950 from timvandermeij/polygon-polyline-annotations Implement support for polyline and polygon annotations	2017-09-24 15:16:14 +02:00
Tim van der Meij	c69a7a83da	Merge pull request #8932 from janpe2/jbig2-sym-offset JBIG2 symbol offsets	2017-09-23 17:11:45 +02:00
Tim van der Meij	8ccad276b2	Implement support for polygon annotations	2017-09-23 16:52:47 +02:00
Tim van der Meij	99b17a494d	Implement support for polyline annotations	2017-09-23 16:37:23 +02:00
Jonas Jenwald	8a084aff0f	Remove the `instanceof AlternateCS` check in `ColorSpace.parse` since it's dead code Looking at `ColorSpace.parseToIR`, it will do one of the following things when called: 1. Return a String. 2. Return an Array. 3. Throw a `FormatError`. 4. In one case, return the result of another `ColorSpace.parseToIR` call. However, under no circumstances will it ever return an `AlternateCS` instance. Since it's often useful to understand why code, which has become unused, existed in the first place, let's grab a hard hat and a shovel and start digging through the history of this code :-) The current condition was introduced in commit `c198ec4323`, in PR 794, but it was actually already obsolete by that time. The preceeding `instanceof SeparationCS` condition predates commit `a7278b7fbc`, in PR 700. That condition was originally introduced all the way back in commit `4e3f87b60c`, in PR 692. However, it was made obsolete by commit `9dcefe1efc`, which is included in the very same PR! Hence we're left with the conclusion that not only has this code be unused for almost six years, it was basically never used at all save for a few refactoring commits that're part of PR 692.	2017-09-23 14:36:10 +02:00
Jonas Jenwald	abc864fca9	Merge pull request #8938 from brendandahl/bug1392647 Use font's default width even when 0. (bug 1392647)	2017-09-20 22:38:39 +02:00
Brendan Dahl	10ba292b46	Use font's default width even when 0. Bug 1392647 has a PDF where the default width of the font is 0. It draws some charcodes that don't have glyphs, but we were wrongly using the 1000 default width for these charcodes causing some text to be overlapping.	2017-09-20 11:38:30 -07:00
Jani Pehkonen	5d1074c110	Fix JBIG2 symbol offsets in text regions	2017-09-19 23:43:23 +03:00
Jani Pehkonen	3d99b8d706	CCITTFaxStream problem when EndOfBlock is false	2017-09-19 22:19:40 +03:00
Tilman Hausherr	d75a497a6b	support tiff predictor for 16bit (for issue #6289) This does the same for 16 bit as the existing 8 bit tiff predictor code, an addition of the last word to this word. The last two "& 0xFF" may or may not be needed, I see this isn't done in the 8 bit code, but I'm not a JS developer.	2017-09-18 22:24:25 +02:00
Tim van der Meij	400e4aae0e	Implement support for stamp annotations	2017-09-16 16:37:50 +02:00
Tim van der Meij	3be941d982	Merge pull request #8909 from Snuffleupagus/PDFFunction-isEvalSupported Check `isEvalSupported`, and test that `eval` is actually supported, before attempting to use the `PostScriptCompiler` (issue 5573)	2017-09-16 16:11:03 +02:00
Jonas Jenwald	eece66fa3e	For /Filter entries containing `Name`s, ignore the /DecodeParms entry if it contains an Array (issue 8895)	2017-09-15 23:02:16 +02:00
Jonas Jenwald	dc926ffc0f	Check `isEvalSupported`, and test that `eval` is actually supported, before attempting to use the `PostScriptCompiler` (issue 5573) Currently `PDFFunction` is implemented (basically) like a class with only `static` methods. Since it's used directly in a number of different `src/core/` files, attempting to pass in `isEvalSupported` would result in code that's very messy, not to mention difficult to maintain (since every single `PDFFunction` method call would need to include a `isEvalSupported` argument). Rather than having to wait for a possible re-factoring of `PDFFunction` that would avoid the above problems by design, it probably makes sense to at least set `isEvalSupported` globally for `PDFFunction`. Please note that there's one caveat with this solution: If `PDFJS.getDocument` is used to open multiple files simultaneously, with different `PDFJS.isEvalSupported` values set before each call, then the last one will always win. However, that seems like enough of an edge-case that we shouldn't have to worry about it. Besides, since we'll also test that `eval` is actually supported, it should be fine. Fixes 5573.	2017-09-15 12:02:45 +02:00
Tim van der Meij	320779e6ed	Merge pull request #8691 from timvandermeij/square-circle-annotations Implement support for square and circle annotations	2017-09-09 22:56:54 +02:00
Tim van der Meij	44c116ac49	Implement support for circle annotations	2017-09-09 21:36:27 +02:00
Tim van der Meij	cace2e9047	Implement support for square annotations	2017-09-09 21:36:27 +02:00
Jonas Jenwald	8686baede5	Replace `value === (value \| 0)` checks with `Number.isInteger(value)` in the `src/` folder Rather than doing what (at first) may seem like a fairly obscure comparison, using `Number.isInteger` will clearly indicate the intent of the code.	2017-09-09 14:12:52 +02:00
Jonas Jenwald	cfb4955a92	Replace the `isArray` helper function with the native `Array.isArray` function Follow-up to PR 8813.	2017-09-01 20:27:13 +02:00
Jonas Jenwald	11408da340	Replace the `isInt` helper function with the native `Number.isInteger` function Follow-up to PR 8643.	2017-09-01 16:52:50 +02:00
Jonas Jenwald	772a5412a4	Avoid some redundant type checks in `XRef.fetchUncompressed` When looking briefly at using `Number.isInteger`/`Number.isNan` rather than `isInt`/`isNaN`, I noticed that there's a couple of not entirely straightforward cases to consider. At first I really couldn't understand why `parseInt` is being used like it is in `XRef.fetchUncompressed`, since the `num` and `gen` properties of an object reference should always be integers. However, doing a bit of code archaeology pointed to PR 4348, and it thus seem that this was a very deliberate change. Since I didn't want to inadvertently introduce any regressions, I've kept the `parseInt` calls intact but moved them to occur only when actually necessary.[1] Secondly, I noticed that there's a redundant `isCmd` check for an edge-case of broken operators. Since we're throwing a `FormatError` if `obj3` isn't a command, we don't need to repeat that check. In practice, this patch could perhaps be considered as a micro-optimization, but considering that `XRef.fetchUncompressed` can be called many thousand times when loading larger PDF documents these changes at least cannot hurt. --- [1] I even ran all tests locally, with an added `assert(Number.isInteger(obj1) && Number.isInteger(obj2));` check, and everything passed with flying colours. However, since it appears that this was in fact necessary at one point, one possible explanation is that the failing test-case(s) have now been replaced by reduced ones.	2017-08-31 16:49:04 +02:00
Tim van der Meij	a4cc85fc5f	Merge pull request #8828 from timvandermeij/es6-annotations Improve the annotation code by converting to ES6 syntax and removing duplicate code	2017-08-31 00:02:07 +02:00
Jonas Jenwald	49b8cd5a6a	Attempt to improve the `EI` detection heuristics, for inline images, in streams containing `NUL` bytes (issue 8823) Since this patch will now treat (some) `NUL` bytes as "ASCII", the number of `followingBytes` checked are thus increased to (hopefully) reduce the risk of introducing new false positives. Fixes 8823.	2017-08-27 12:48:28 +02:00
Tim van der Meij	2512eccbf0	Implement `getOperatorList` method in the `WidgetAnnotation` class to avoid duplication in subclasses	2017-08-27 01:02:41 +02:00
Tim van der Meij	4f02857394	Let the two annotation factories use static methods This corresponds to how other factories are implemented.	2017-08-27 01:02:40 +02:00
Tim van der Meij	24d741d045	Convert `src/core/annotation.js` to ES6 syntax	2017-08-27 00:53:45 +02:00
Jonas Jenwald	42f2d36d1f	Account for broken outlines/annotations, where the destination dictionary contains an invalid `/Dest` entry According to the specification, see http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#page=377, a `Dest` entry in an outline item should not contain a dictionary. Unsurprisingly there's PDF generators that completely ignore this, treating is an `A` entry instead. The patch also adds a little bit more validation code in `Catalog.parseDestDictionary`.	2017-08-26 17:38:15 +02:00
Jonas Jenwald	4660cf8238	Prevent an infinite loop in `XRef.readXRef` by keeping track of already parsed tables (bug 1393476) With this patch, not only is the infinite loop prevented, but we're also able to actually render the file (which e.g. Adobe Reader isn't able to). Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1393476.	2017-08-24 19:18:08 +02:00
Tim van der Meij	e9ba54940d	Merge pull request #8800 from Snuffleupagus/issue-8798 Try to recover if we reach the end of the stream when searching for the `EI` marker of an inline image (issue 8798)	2017-08-23 23:47:51 +02:00
Jonas Jenwald	ca936ee0c7	Merge pull request #8491 from janpe2/jbig2Halftone-2 JBIG2 halftone regions and pattern dictionaries	2017-08-23 00:13:43 +02:00
Jonas Jenwald	cb55506b95	Try to recover if we reach the end of the stream when searching for the `EI` marker of an inline image (issue 8798)	2017-08-22 09:33:13 +02:00
Jonas Jenwald	2112999db7	Fix caching of small inline images in `Parser.makeInlineImage` (issue 8790) Follow-up to PR 5445. Using the PDF file from issue 2618, i.e. http://bugzilla-attachments.gnome.org/attachment.cgi?id=226471, with the following manifest file: ```json [ { "id": "issue2618", "file": "../web/pdfs/issue2618.pdf", "md5": "", "rounds": 50, "type": "eq" } ] ``` I get the following results when comparing `master` against this patch: ``` browser \| stat \| Count \| Baseline(ms) \| Current(ms) \| +/- \| % \| Result(P<.05) ------- \| ------------ \| ----- \| ------------ \| ----------- \| ---- \| ------ \| ------------- firefox \| Overall \| 50 \| 4694 \| 3974 \| -721 \| -15.35 \| faster firefox \| Page Request \| 50 \| 2 \| 1 \| 0 \| -22.83 \| firefox \| Rendering \| 50 \| 4692 \| 3972 \| -720 \| -15.35 \| faster ``` So, based on these results, it seems like a fairly clear win to fix this broken caching :-)	2017-08-18 23:08:55 +02:00
Jonas Jenwald	563b68e74d	Remove manual clamping code in `src/core/jpx.js` Since we're now using `Uint8ClampedArray`, rather than `Uint8Array`, doing manual clamping shouldn't be necessary given that that is now handled natively. This shouldn't have any measurable performance impact, but just to sanity check that I've done some quick benchmarking with the following manifest file: ```json [ { "id": "S2-eq", "file": "pdfs/S2.pdf", "md5": "d0b6137846df6e0fe058f234a87fb588", "rounds": 100, "type": "eq" } ] ``` which gave the following results against the current `master` (repeated benchmark runs didn't result in any meaningful differences): ``` -- Grouped By browser, stat -- browser \| stat \| Count \| Baseline(ms) \| Current(ms) \| +/- \| % \| Result(P<.05) ------- \| ------------ \| ----- \| ------------ \| ----------- \| --- \| ----- \| ------------- firefox \| Overall \| 100 \| 592 \| 592 \| 1 \| 0.12 \| firefox \| Page Request \| 100 \| 3 \| 3 \| 0 \| -9.88 \| firefox \| Rendering \| 100 \| 588 \| 589 \| 1 \| 0.18 \| ```	2017-08-16 13:24:28 +02:00
Jonas Jenwald	f6636d6b19	Use `Uint8ClampedArray` when returning image data in `src/core/jbig2.js` and `src/core/jpg.js`	2017-08-16 13:24:28 +02:00
Jonas Jenwald	74ad90cb8f	Update the mask data inversion in `PDFImage.createMask` to be compatible with both `Uint8Array` and `Uint8ClampedArray`	2017-08-16 13:24:21 +02:00
Jonas Jenwald	d6cd5355f0	Use `Uint8ClampedArray`, when returning data, and remove manual clamping in `src/core/jpg.js` (issue 4901) This patch removes the `clamp0to255` helper function, as well as manual clamping code in `src/core/jpg.js`. The adjusted constants in `_convertCmykToRgb` were taken from CMYK to RGB conversion code found in `src/core/colorspace.js`. Please note: There will be some very slight movement in a number of existing test-cases, since `Uint8ClampedArray` appears to use `Math.round` (or equivalent) and the old code used (basically) `Math.floor`.	2017-08-14 16:19:57 +02:00
Jani Pehkonen	9a581ee9ed	Implement JBIG2 halftone regions and pattern dictionaries	2017-08-08 15:38:29 +03:00
Jonas Jenwald	093afd1212	Replace the `coded` property with `isType3Font` when building the font `properties` object in `PartialEvaluator.translateFont` This appears to simply have been forgotten in the re-factoring in PR 4815, where the `coded` property was renamed to the much more descriptive `isType3Font` property.	2017-08-08 14:03:02 +02:00
Jonas Jenwald	4729e96fb7	Remove leftover `args[0].code` checks from the `OPS.paintXObject` cases in evaluator.js From looking at blame, it seems that these checks became obsolete with PR 692 (which landed close to six years ago). Note how, after that PR, there's no longer anything being assigned to the `code` property of an Object.	2017-08-07 10:48:37 +02:00
Jonas Jenwald	ace9de6f7d	Merge pull request #8747 from brendandahl/first-cmap Fix two cmap related issues.	2017-08-04 14:11:12 +02:00
Brendan Dahl	0bef50d56d	Fix two cmap related issues. In issue #8707, there's a char code mapped to a non- existing glyph which shouldn't be drawn. However, we saw it was missing and tried to then use the post table and end up mapping it incorrectly. This illuminated a problem with issue #5704 and bug 893730 where glyphs disappeared after above fix. This was from the cmap returning the wrong glyph id. Which in turn was caused because the font had multiple of the same type of cmap table and we were choosing the last one. Now, we instead default to the first one. I'm unsure if we should instead be merging the multiple cmaps, but using only the first one works.	2017-08-03 22:19:36 -07:00
Yury Delendik	a1dfbec532	Properly cancel streams and guard at getTextContent.	2017-08-03 16:36:46 -05:00
Jonas Jenwald	e20d4a9c21	Merge pull request #8681 from brendandahl/glyph-ids Fix several issues with glyph id mappings (issue 8668, bug 1383504)	2017-08-03 14:25:34 +02:00
Brendan Dahl	5b7f712ca7	Merge pull request #8627 from yurydelendik/issue-8591 Fallback on font widths if CFF data is broken	2017-08-02 10:53:14 -07:00

1 2 3 4 5 ...

1214 Commits