Sakurai/pdf.js - pdf.js - Gitea on kemo

Sakurai/pdf.js

Author	SHA1	Message	Date
Jonas Jenwald	b773b356af	Convert `NameOrNumberTree`, `NameTree`, and `NumberTree` to ES6 classes Also changes `var` to `let`/`const` in code already touched in the patch.	2018-07-09 21:12:01 +02:00
Jonas Jenwald	56e3648b65	Add basic validation of the 'trailer' dictionary candidates in `XRef.indexObjects` (issue 9418) This patch avoids choosing a (possible) 'trailer' dictionary that `XRef.parse` and/or the `Catalog` constructor/methods will reject anyway. Since `XRef.indexObjects` is already parsing the entire PDF file, the extra dictionary look-ups added here shouldn't matter much. Besides, this is a fallback code-path that only applies to corrupt PDF files anyway.	2018-06-20 13:41:22 +02:00
Jonas Jenwald	346810e02a	Add basic validation of the 'Root' dictionary in `XRef.parse` and try to recover when possible Note that the `Catalog` constructor, and some of its methods, are already enforcing that the 'Root' dictionary is valid/well-formed. However, by doing additional validation already in `XRef.parse` there's a slightly larger chance that corrupt PDF files could be successfully parsed/rendered.	2018-06-20 13:41:22 +02:00
Jonas Jenwald	e84813e7cc	Prevent hard errors if fetching the `Encrypt` dictionary fails in `XRef.parse`	2018-06-20 13:41:22 +02:00
Jonas Jenwald	30ad62a86a	Use the correct `startPos` when repeating the search for 'endobj' operators in `XRef.indexObjects` (PR 9288 follow-up)	2018-06-20 13:41:22 +02:00
Jonas Jenwald	731f2e6dfc	Remove manual clamping/rounding from `ColorSpace` and `PDFImage`, by having their methods use `Uint8ClampedArray`s The built-in image decoders are already using `Uint8ClampedArray` when returning data, and this patch simply extends that to the rest of the image/colorspace code. As far as I can tell, the only reason for using manual clamping/rounding in the first place was because TypedArrays used to be polyfilled (using regular arrays). And trying to polyfill the native clamping/rounding would probably have been had too much overhead, but given that TypedArray support is required in PDF.js version `2.0` that's no longer a concern. Please note: Because of different rounding behaviour, basically `Math.round` in `Uint8ClampedArray` respectively `Math.floor` in the old code, there will be very slight movement in quite a few existing test-cases. However, the changes should be imperceivable to the naked eye, given that the absolute difference is at most `1` for each RGB component when comparing `master` and this patch (see also the updated expectation values in the unit-tests).	2018-06-12 11:01:32 +02:00
Wojciech Maj	ea2850e9a7	Fix typos	2018-04-01 23:20:41 +02:00
Jonas Jenwald	374d074f6e	Add stricter validation in `Catalog.readPageLabels` The current PageLabel dictionary validation code won't catch some (unlikely) forms of corruption. For example: a `Type`/`S` entry being `null`/`0`/empty string, a `P`/`St` entry being `null`/`0`. Please note: I'm not aware of any bugs caused by the old code, but I've had this patch sitting locally for some time and figured it couldn't hurt to submit it.	2018-03-21 14:36:05 +01:00
Jonas Jenwald	d431ae069d	Attempt to handle corrupt PDF documents that inline Page dictionaries in a Kids array (issue 9540) According to the specification, see https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G6.1942297, the contents of a Kids array should be indirect objects.	2018-03-12 14:13:23 +01:00
Jonas Jenwald	1dc54ddb40	Handle PDF files with missing 'endobj' operators, by searching for the "obj" string rather than "endobj" in `XRef.indexObjects` (issue 9105) This patch refactors the searching for 'endobj', to try and find the next occurance of "obj" and then check if it was in fact an 'endobj' and continue searching otherwise. This approach is used to avoid having to first find 'endobj', and then re-check the entire contents of the object and having to run (potentially expensive) regular expressions on arbitrary long strings. Fixes 9105.	2017-12-18 13:17:45 +01:00
Naveen Jain	1135674647	Replaced occurence of `throw new Error` with `unreachable` where applicable	2017-12-14 12:58:50 +05:30
Jonas Jenwald	1cd1582cb9	[api-major] Change `getJavaScript` to return `null`, rather than an empty Array, when no JavaScript exists Other API methods already return `null`, rather than empty Arrays/Objects, hence it makes sense to change `getJavaScript` to be consistent.	2017-10-15 22:17:14 +02:00
Jonas Jenwald	cfb4955a92	Replace the `isArray` helper function with the native `Array.isArray` function Follow-up to PR 8813.	2017-09-01 20:27:13 +02:00
Jonas Jenwald	11408da340	Replace the `isInt` helper function with the native `Number.isInteger` function Follow-up to PR 8643.	2017-09-01 16:52:50 +02:00
Jonas Jenwald	772a5412a4	Avoid some redundant type checks in `XRef.fetchUncompressed` When looking briefly at using `Number.isInteger`/`Number.isNan` rather than `isInt`/`isNaN`, I noticed that there's a couple of not entirely straightforward cases to consider. At first I really couldn't understand why `parseInt` is being used like it is in `XRef.fetchUncompressed`, since the `num` and `gen` properties of an object reference should always be integers. However, doing a bit of code archaeology pointed to PR 4348, and it thus seem that this was a very deliberate change. Since I didn't want to inadvertently introduce any regressions, I've kept the `parseInt` calls intact but moved them to occur only when actually necessary.[1] Secondly, I noticed that there's a redundant `isCmd` check for an edge-case of broken operators. Since we're throwing a `FormatError` if `obj3` isn't a command, we don't need to repeat that check. In practice, this patch could perhaps be considered as a micro-optimization, but considering that `XRef.fetchUncompressed` can be called many thousand times when loading larger PDF documents these changes at least cannot hurt. --- [1] I even ran all tests locally, with an added `assert(Number.isInteger(obj1) && Number.isInteger(obj2));` check, and everything passed with flying colours. However, since it appears that this was in fact necessary at one point, one possible explanation is that the failing test-case(s) have now been replaced by reduced ones.	2017-08-31 16:49:04 +02:00
Jonas Jenwald	42f2d36d1f	Account for broken outlines/annotations, where the destination dictionary contains an invalid `/Dest` entry According to the specification, see http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#page=377, a `Dest` entry in an outline item should not contain a dictionary. Unsurprisingly there's PDF generators that completely ignore this, treating is an `A` entry instead. The patch also adds a little bit more validation code in `Catalog.parseDestDictionary`.	2017-08-26 17:38:15 +02:00
Jonas Jenwald	4660cf8238	Prevent an infinite loop in `XRef.readXRef` by keeping track of already parsed tables (bug 1393476) With this patch, not only is the infinite loop prevented, but we're also able to actually render the file (which e.g. Adobe Reader isn't able to). Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1393476.	2017-08-24 19:18:08 +02:00
Tim van der Meij	af71ea7a7d	Merge pull request #8673 from Snuffleupagus/api-pageMode [api-minor] Add support for PageMode in the API and viewer (issue 8657)	2017-07-23 13:17:07 +02:00
Jonas Jenwald	814fa1dee3	Remove most `assert()` calls (issue 8506) This replaces `assert` calls with `throw new FormatError()`/`throw new Error()`. In a few places, throwing an `Error` (which is what `assert` meant) isn't correct since the enclosing function is supposed to return a `Promise`, hence some cases were changed to `Promise.reject(...)` and similarily for `createPromiseCapability` instances.	2017-07-21 18:51:02 +02:00
Jonas Jenwald	15f0963f51	Fix a typo, in the `Catalog.numPages` getter, than prevents shadowing from working correctly Looking at the blame, it seems that this typo was present even before PR 700 (almost six years ago). The result of using `'num'`, rather than the correct `'numPages'` string, is that the `Catalog.numPages` getter isn't actually being shadowed.	2017-07-20 12:35:09 +02:00
Jonas Jenwald	16c5d41c5b	[api-minor] Add support for PageMode in the API (issue 8657) Please refer to https://wwwimages2.adobe.com/content/dam/Adobe/en/devnet/pdf/pdfs/PDF32000_2008.pdf#page=82.	2017-07-19 16:40:03 +02:00
Yury Delendik	d028c26210	Removes error()	2017-07-07 09:40:24 -05:00
Jonas Jenwald	3a20fd165f	Refactor `ObjectLoader` to use `Dict`s correctly, rather than abusing their internal properties The `ObjectLoader` currently takes an Object as input, despite actually working with `Dict`s internally. This means that at the (two) existing call-sites, we're passing in the "private" `Dict.map` property directly. Doing this seems like an anti-pattern, and we could (and even should) simply provide the actual `Dict` when creating an `ObjectLoader` instance. Accessing properties stored in the `Dict` is now done using the intended methods instead, in particular `getRaw` which (as the name suggests) doesn't do any de-referencing, thus maintaining the current functionality of the code. The only functional change in this patch is that `ObjectLoader.load` will now ignore empty nodes, such that `ObjectLoader._walk` only needs to deal with nodes that are known to contain data. (This lets us skip, among other checks, meaningless `addChildren` function calls.)	2017-06-16 22:59:32 +02:00
Jonas Jenwald	f2fc9ee281	Slightly refactor and ES6-ify the code in `ObjectLoader` This patch changes all `var` to `let`, and caches the array lengths in all loops. Also removes two unnecessary temporary variable assignments.	2017-06-16 22:59:32 +02:00
Jonas Jenwald	a8c87f8019	Fix inconsistent spacing and trailing commas in objects in `src/core/` files, so we can enable the `comma-dangle` and `object-curly-spacing` ESLint rules later on Unfortunately this patch is fairly big, even though it only covers the `src/core` folder, but splitting it even further seemed difficult. http://eslint.org/docs/rules/comma-dangle http://eslint.org/docs/rules/object-curly-spacing Given that we currently have quite inconsistent object formatting, fixing this in one big patch probably wouldn't be feasible (since I cannot imagine anyone wanting to review that); hence I've opted to try and do this piecewise instead. Please note: This patch was created automatically, using the ESLint --fix command line option. In a couple of places this caused lines to become too long, and I've fixed those manually; please refer to the interdiff below for the only hand-edits in this patch. ```diff diff --git a/src/core/evaluator.js b/src/core/evaluator.js index abab9027..dcd3594b 100644 --- a/src/core/evaluator.js +++ b/src/core/evaluator.js @@ -2785,7 +2785,8 @@ var EvaluatorPreprocessor = (function EvaluatorPreprocessorClosure() { t['Tz'] = { id: OPS.setHScale, numArgs: 1, variableArgs: false, }; t['TL'] = { id: OPS.setLeading, numArgs: 1, variableArgs: false, }; t['Tf'] = { id: OPS.setFont, numArgs: 2, variableArgs: false, }; - t['Tr'] = { id: OPS.setTextRenderingMode, numArgs: 1, variableArgs: false, }; + t['Tr'] = { id: OPS.setTextRenderingMode, numArgs: 1, + variableArgs: false, }; t['Ts'] = { id: OPS.setTextRise, numArgs: 1, variableArgs: false, }; t['Td'] = { id: OPS.moveText, numArgs: 2, variableArgs: false, }; t['TD'] = { id: OPS.setLeadingMoveText, numArgs: 2, variableArgs: false, }; diff --git a/src/core/jbig2.js b/src/core/jbig2.js index 5a17d482..71671541 100644 --- a/src/core/jbig2.js +++ b/src/core/jbig2.js @@ -123,19 +123,22 @@ var Jbig2Image = (function Jbig2ImageClosure() { { x: -1, y: -1, }, { x: 0, y: -1, }, { x: 1, y: -1, }, { x: -2, y: 0, }, { x: -1, y: 0, }], [{ x: -3, y: -1, }, { x: -2, y: -1, }, { x: -1, y: -1, }, { x: 0, y: -1, }, - { x: 1, y: -1, }, { x: -4, y: 0, }, { x: -3, y: 0, }, { x: -2, y: 0, }, { x: -1, y: 0, }] + { x: 1, y: -1, }, { x: -4, y: 0, }, { x: -3, y: 0, }, { x: -2, y: 0, }, + { x: -1, y: 0, }] ]; var RefinementTemplates = [ { coding: [{ x: 0, y: -1, }, { x: 1, y: -1, }, { x: -1, y: 0, }], - reference: [{ x: 0, y: -1, }, { x: 1, y: -1, }, { x: -1, y: 0, }, { x: 0, y: 0, }, - { x: 1, y: 0, }, { x: -1, y: 1, }, { x: 0, y: 1, }, { x: 1, y: 1, }], + reference: [{ x: 0, y: -1, }, { x: 1, y: -1, }, { x: -1, y: 0, }, + { x: 0, y: 0, }, { x: 1, y: 0, }, { x: -1, y: 1, }, + { x: 0, y: 1, }, { x: 1, y: 1, }], }, { - coding: [{ x: -1, y: -1, }, { x: 0, y: -1, }, { x: 1, y: -1, }, { x: -1, y: 0, }], - reference: [{ x: 0, y: -1, }, { x: -1, y: 0, }, { x: 0, y: 0, }, { x: 1, y: 0, }, - { x: 0, y: 1, }, { x: 1, y: 1, }], + coding: [{ x: -1, y: -1, }, { x: 0, y: -1, }, { x: 1, y: -1, }, + { x: -1, y: 0, }], + reference: [{ x: 0, y: -1, }, { x: -1, y: 0, }, { x: 0, y: 0, }, + { x: 1, y: 0, }, { x: 0, y: 1, }, { x: 1, y: 1, }], } ]; ```	2017-06-02 11:20:19 +02:00
Jonas Jenwald	982b6aa65b	Convert the files in the `/src/core` folder to ES6 modules Please note that the `glyphlist.js` and `unicode.js` files are converted to CommonJS modules instead, since Babel cannot handle files that large and they are thus excluded from transpilation.	2017-05-30 22:06:21 +02:00
Jonas Jenwald	0ddf52aca5	Remove the special handling for `nameddest`s that look like standard pageNumbers PR 7341 added special handling for `nameddest`s that look like pageNumbers, to prevent issues since we previously incorrectly supported specifying a pageNumber directly in the hash; i.e. `#10` versus the correct `#page=10` format. Since this behaviour wasn't correct, PR 7757 fixed and deprecated the old format, which means that we no longer need to maintain the `nameddest` hack in multiple files.	2017-05-20 11:29:29 +02:00
Jonas Jenwald	ebaa22478c	Replace unnecessary `bind(this)` and `var self = this` statements with arrow functions in remaining `src/core/` files	2017-05-02 15:47:43 +02:00
Jonas Jenwald	afc74b0178	Enable the `object-shorthand` ESLint rule in `src/shared` Please see http://eslint.org/docs/rules/object-shorthand. For the most part, these changes are of the search-and-replace kind, and the previously enabled `no-undef` rule should complement the tests in helping ensure that no stupid errors crept into to the patch.	2017-04-27 17:29:40 +02:00
Jonas Jenwald	61ee0de29f	Use a simple `RefSetCache` to significantly improve the performance of `Catalog.getPageDict` for certain long documents (PR 8105 follow-up) I found that PR 8105 unfortunately causes a very serious performance regression in long PDF documents where the `Pages` tree only has one level; my apologies for this! Obviously we cannot revert that PR, since that would cause more issues than it solves. Hence it seems to me that the only viable solution here, is to add a simple `RefSetCache` to reduce the amount of redundant lookups. Previously in PR 8105 caching was thought to be unnecessary, but as it turns out I don't think that we really have a choice in the matter any more.	2017-03-28 21:39:55 +02:00
Rob Wu	49af56f730	Rethrow MissingDataException when needed In core/document.js: `PDFDocument.prototype.parse` accesses a dictionary property, which could throw if the underlying data is not yet available. In core/obj.js: `get Catalog.prototype.metadata` calls `stream.getBytes`, which can throw MissingDataException too when the stream is a ChunkedStream.	2017-03-22 14:55:59 +01:00
Jonas Jenwald	9163a6fba4	Merge pull request #8112 from Snuffleupagus/JS-action-newWindow Support the `newWindow` flag in white-listed `app.launchURL` JavaScript actions (PR 7794 follow-up)	2017-03-01 21:24:34 +01:00
Jonas Jenwald	2a7e5b8a54	Support the `newWindow` flag in white-listed `app.launchURL` JavaScript actions (PR 7794 follow-up) A simple follow-up to PR 7794, which let's us add support for the `newWindow` parameter; refer to https://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/js_api_reference.pdf#G5.1507380. The patch also fixes an embarrassing oversight regarding the placement of the case-insensitive flag, and also allows arbitrary white-space at the beginning of JS actions.	2017-02-27 15:58:28 +01:00
Jonas Jenwald	14cc6acb90	Ensure that `Dict`s found in Object Streams are assigned an `objId` in `XRef.fetch` This fixes something that I noticed while working with the code in `Catalog.getPageDict` when debugging issue 8088. Note that while I don't have an example where this patch really matters, given that e.g. `PartialEvaluator.hasBlendModes` depends on the `objId` to avoid cyclic references this patch could potentially help for some PDF files.	2017-02-25 10:20:19 +01:00
Jonas Jenwald	1ce295541c	Always check all Kids nodes, in `Catalog.getPageDict`, to avoid getting stuck in an empty node further down in the Pages tree (issue 8088) As discussed on IRC, we need to check all nodes at the bottom of the tree to ensure that we find the correct `Page` dict. Furthermore, this patch also gets rid of the caching present in a previous version, since it's not clear if that really helps. Note that this patch purposely adds an `eq` test, using a reduced test-case, so that we can be sure that the algorithm actually finds the correct `Page` dict for each `pageIndex`. Fixes 8088.	2017-02-24 12:09:46 +01:00
Jonas Jenwald	111419a64a	Cache built-in binary CMap files in the worker (issue 4794)	2017-02-16 10:55:39 +01:00
Jonas Jenwald	4046d67fde	Enable the `no-else-return` ESLint rule Using `else` after `return` is not necessary, and can often lead to unnecessarily cluttered code. By using the `no-else-return` rule in ESLint we can avoid this pattern, see http://eslint.org/docs/rules/no-else-return.	2017-01-09 20:27:39 +01:00
Jonas Jenwald	14b8523314	Refactor the `password` handling so that it's stored in the `PdfManager`s, instead of in the `XRef` We're already passing in a, currently unused, `PdfManager` instance when initializing the `XRef`. To avoid having to pass a single `password` parameter around, we could thus simply get the `password` through the `PdfManager` instance instead.	2017-01-03 20:29:52 +01:00
Jonas Jenwald	c6008b4d7c	Fix the JSDoc comment for `Catalog.parseDestDictionary`	2016-11-27 11:18:18 +01:00
Jonas Jenwald	6d8a404a9c	[api-minor] Add support for a couple of white-listed `JavaScript` actions that contains valid URLs (issue 3897, bug 843699) By only allowing very specific type of `JavaScript` actions, and also utilizing the existing `URL` validation, this patch shouldn't pose too much risk. Fixes one of the points in issue 3897 (with the PDF file taken from issue 3438). Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=843699 (probably, since that bug doesn't contain a test-case).	2016-11-08 16:48:27 +01:00
Jonas Jenwald	0844a72b4d	Add a bit more validation to `Catalog_readPageLabels`, to ensure that the Page Labels are well formed	2016-11-03 20:08:06 +01:00
Jonas Jenwald	2d8d8b5e53	Use `stringToPDFString` to sanitizing bad "Prefix" entries in Page Label dictionaries It seems that certain bad PDF generators can create badly encoded "Prefix" entries for Page Labels, one example being http://ukjewishfilm.org/wp-content/uploads/2015/09/Jewish-Film-Festival-Programme-ONLINE.pdf. Unfortunately I didn't come across such a PDF file while adding the API support for Page Labels, but with them now being used in the viewer I just found this issue. With this patch, we now display the Page Labels in the same way as Adobe Reader.	2016-11-03 19:48:08 +01:00
Tim van der Meij	6e22b32372	Merge pull request #7745 from Snuffleupagus/Launch-actions [api-minor] Add basic support for `Launch` actions (issue 1778, issue 3897, issue 6616)	2016-11-01 21:12:08 +01:00
Tim van der Meij	5194e68134	Lint: correct code style violations Manual observations and working with other linting tools found these.	2016-11-01 15:04:21 +01:00
Jonas Jenwald	2b79782377	[api-minor] Add basic support for `Launch` actions (issue 1778, issue 3897, issue 6616) In general we neither want, nor can, support arbitrary `Launch` actions. But in practice, all the cases we've seen so far just contains relative URLs to other PDF files. Building on PR 7689, we can thus at least support basic `Launch` actions.	2016-10-21 13:40:32 +02:00
Jonas Jenwald	d284cfd5eb	[api-minor] Add support for relative URLs, in both annotations and the outline, by adding a `docBaseUrl` parameter to `PDFJS.getDocument` (bug 766086) Note that in `FIREFOX/MOZCENTRAL/CHROME` builds of the standard viewer the `docBaseUrl` parameter will be set by default, since in that case it makes sense to use the current URL as a base. For the `GENERIC` viewer, or the API itself, it doesn't make sense to try and set the `docBaseUrl` by default. However, custom deployments/implementations may still find the parameter useful.	2016-10-19 22:20:24 +02:00
Jonas Jenwald	71a781ee5c	Deprecate the `isValidUrl` utility function and replace it with `createValidAbsoluteUrl`/`isValidProtocal` functions instead, since the main URL validation is now done using the `new URL` constructor	2016-10-19 22:11:22 +02:00
Jonas Jenwald	42f07c6262	[api-minor] Use the `new URL` constructor when validating URLs in annotations and the outline, as a complement to only checking the protocol, and add a bit more validation to `Catalog_parseDestDictionary` Note that this will automatically reject any relative URL. To make the API more useful to consumers, URLs that are rejected will be available via the `unsafeUrl` property in the data object returned by `PDFPageProxy_getAnnotations`. The patch also adds a bit more validation of the data for `Named` actions.	2016-10-19 22:11:17 +02:00
Jonas Jenwald	e64bc1fd13	Move parsing of destination dictionaries to a helper function This not only reduces code duplication, but it also allow us to easily support the same kind of URLs we currently do for Link annotations in the Outline as well.	2016-10-18 16:14:07 +02:00
Jonas Jenwald	3e77cf6b32	Prevent an infinite loop in `XRef_fetchUncompressed` for encrypted PDF files with indirect objects in the /Encrypt dictionary (issue 7665)	2016-09-25 00:18:47 +02:00

1 2 3