pdf.js

Author	SHA1	Message	Date
Jonas Jenwald	3bcf9187ec	Add a polyfill for `classList.{add, remove}` with more than one parameter Unsurprisingly IE11 doesn't support this, so a polyfill is needed since otherwise the sidebar can no longer be opened. Also, simplifies the existing `classList.toggle` polyfill.	2019-02-08 13:35:01 +01:00
Jonas Jenwald	614e502227	[api-minor] Remove the `document.currentScript` polyfill This polyfill is currently used in only one file, i.e. `src/display/api.js`, and only when trying to build a fallback `workerSrc` path. Given that the global `workerSrc` should always be set[1] when using the PDF.js library[2], and that the fallback `workerSrc` should only be regarded as a best-effort solution anyway, there isn't a particularily strong reason to keep the compatibility code in my opinion. --- [1] Other supported options include setting the global `workerPort`, or passing in a `PDFWorker` instance as part of the `getDocument` call. [2] Which is clearly mentioned in the JSDocs in `src/display/worker_options.js`.	2019-02-03 14:09:24 +01:00
Jonas Jenwald	22468817e1	Add a `settled` property, tracking the fulfilled/rejected stated of the Promise, to `createPromiseCapability` This allows cleaning-up code which is currently manually tracking the state of the Promise of a `createPromiseCapability` instance.	2019-02-02 15:18:56 +01:00
Jonas Jenwald	6f94a05a29	Do the final text scaling correctly in `flushTextContentItem` (issue 8276) It's necessary to take into account whether or not the text is vertical, to avoid either the textContent `width` or `height` becoming incorrect.	2019-01-29 15:24:04 +01:00
Jonas Jenwald	5081063b9e	Attempt to clean-up/restore pending rendering operations when errors occurs while a `RenderTask` runs (PR 10202 follow-up) This piggybacks of the existing `cancel` functionality, to ensure that any pending operations are closed and that any temporary canvases are actually being removed. Also simplifies `finishPaintTask` in `PDFPageView.draw` slightly, by converting it to an async function.	2019-01-26 16:02:51 +01:00
Jonas Jenwald	29f36d7a1b	Reduce unnecessary duplication of the `isDefaultDecode` methods on `ColorSpace` instances The recent PR 10482 made me realize that I missed an opportunity for simplification when doing the class conversion of this code in PR 10007.	2019-01-25 08:53:08 +01:00
Tim van der Meij	e2701d5422	Merge pull request #10482 from janpe2/indexed-decode Implement Decode entry in Indexed images	2019-01-24 23:46:55 +01:00
Jonas Jenwald	41fbc71ef9	Ensure that `XRef.indexObjects` can handle object numbers with zero-padding (issue 10491) All objects in the PDF document follow this pattern: ``` 0000000001 0 obj << % Some content here... >> endobj 0000000002 0 obj << % More content here... endobj ```	2019-01-24 22:37:18 +01:00
Jonas Jenwald	249b199ff1	Stop bundling the `ReadableStream` polyfill in MOZCENTRAL builds (PR 10470 follow-up) Based on the discussion in https://bugzilla.mozilla.org/show_bug.cgi?id=1521413, this patch simply removes the `ReadableStream` polyfill completely from MOZCENTRAL builds. With this patch, the size of the `gulp mozcentral` build target is thus further reduced (building on PR 10470): \| \| `build/mozcentral` \|-------\|------------------- \|master \| 3 339 666 \|patch \| 3 209 572	2019-01-23 20:33:20 +01:00
Jani Pehkonen	26121177ab	Implement Decode entry in Indexed images	2019-01-22 22:51:04 +02:00
Jonas Jenwald	01d624f6a0	Add an `Array.from` polyfill, using core-js, and remove some compatibility hacks from the `src/display/content_disposition.js` file	2019-01-20 08:49:20 +01:00
Tim van der Meij	66acc7397f	Merge pull request #10470 from Snuffleupagus/mozcentral-streams Try to, completely, avoid loading the `ReadableStream` polyfill in MOZCENTRAL builds	2019-01-19 21:22:18 +01:00
Jonas Jenwald	480110625a	Try to, completely, avoid loading the `ReadableStream` polyfill in MOZCENTRAL builds With https://bugzilla.mozilla.org/show_bug.cgi?id=1505122 landing in Firefox 65, the native `ReadableStream` implementation is now enabled by default in Firefox. Obviously it would be nice to simply stop bundling the polyfill in MOZCENTRAL builds altogether, however given that it's still possible to disable[1] `ReadableStream` this is probably not a good idea just yet. Nonetheless, now that native support is available, it seems unnecessary (and wasteful) to keep bundling the polyfill twice[2] in MOZCENTRAL builds. Hence this patch, which contains a suggest approach for packing the polyfill in a separate file which is then only loaded if/when needed. With this patch, the size of the `gulp mozcentral` build target is thus reduced accordingly: \| \| `build/mozcentral` \|-------\|------------------- \|master \| 3 461 089 \|patch \| 3 340 268 Besides the PDF.js files taking up less space in Firefox this way, the additional benefit is that there's (by default) less code that needs to be loaded and parsed when the PDF Viewer is used which also cannot hurt. --- [1] In `about:config`, by toggling the `javascript.options.streams` preference. [2] Once in the `build/pdf.js` file, and once in the `build/pdf.worker.js` file.	2019-01-19 09:05:01 +01:00
Jonas Jenwald	24a688d6c6	Convert some usage of `indexOf` to `startsWith`/`includes` where applicable In many cases in the code you don't actually care about the index itself, but rather just want to know if something exists in a String/Array or if a String starts in a particular way. With modern JavaScript functionality, it's thus possible to remove a number of existing `indexOf` cases.	2019-01-18 17:57:41 +01:00
Tim van der Meij	cdbc33ba06	Merge pull request #10457 from Snuffleupagus/metadata-tests When parsing Metadata, attempt to remove "junk" before the first tag (PR 10398 follow-up)	2019-01-16 23:03:39 +01:00
Jonas Jenwald	68ad3e8e9d	Tweak the `DOMTokenList.toggle` polyfill (issue 10460)	2019-01-16 20:15:44 +01:00
Jonas Jenwald	9f45f8dfda	When parsing Metadata, attempt to remove "junk" before the first tag (PR 10398 follow-up) This will allow the Metadata to be successfully extracted from the PDF file in issue 10395. Furthermore, this patch also fixes a bug in `Metadata.get` which causes the method to return `null` rather than an empty string or zero (since either ought to be allowed).	2019-01-16 12:44:27 +01:00
Jonas Jenwald	b531fc4106	Avoid truncating inline images, where the data and the "EI" marker is glued together (issue 10388) (#10436 ) Thanks to the excellent debugging done by @janpe2, this was easy to fix!	2019-01-12 20:31:23 +01:00
Jonas Jenwald	d4a3858ed5	Handle more cases of corrupt PDF files with missing 'endobj' operators, where the "obj" string is immediately followed by the dictionary (PR 9288 follow-up)	2019-01-10 17:55:28 +01:00
Jonas Jenwald	358cd0c096	Add a few more `String` polyfills (startsWith, endsWith, padStart, padEnd)	2019-01-06 20:10:55 +01:00
Tim van der Meij	f162fed6b9	Convert `src/core/charsets.js` and `src/core/standard_fonts.js` to ES6 syntax Moreover, include the "no var" ESLint comment to `src/core/annotation.js` and `src/core/ps_parser.js` since they are already converted.	2019-01-06 15:04:01 +01:00
Tim van der Meij	3b637e71d4	Convert `src/core/arithmetic_decoder.js` to ES6 syntax	2019-01-06 15:04:01 +01:00
Tim van der Meij	b81984f0cb	Merge pull request #10417 from brendandahl/metric-length Fix reading number of HTMX metrics.	2019-01-05 13:35:16 +01:00
Jonas Jenwald	e8f4b47d59	Prevent errors, in `SimpleXMLParser.onEndElement`, when the stack has already been completely parsed (issue 10410) The error was triggered for a particular set of metadata, where an end tag was encountered without the corresponding begin tag being present in the data. (The patch also fixes a minor oversight, from a recent PR, in the `SimpleDOMNode.nextSibling` method.)	2019-01-05 11:15:34 +01:00
Brendan Dahl	32eace043b	Fix reading number of HTMX metrics. The length of the HHEA table can be incorrect, so it is better to read the number of metrics offset from beginning of table instead.	2019-01-04 15:13:13 -08:00
Tim van der Meij	b39ec7af96	Merge pull request #10408 from Snuffleupagus/issue-10407 Prevent errors, because of incorrect scope, in the `XMLParserBase._resolveEntities` method (issue 10407)	2019-01-04 23:45:26 +01:00
Jonas Jenwald	66fccd860b	Adjust how `AnnotationBorderStyle.setWidth` handles the input being a `Name` (issue 10385) In order to be consistent with the behaviour in Adobe Reader, the width will now always be set to zero when the input is a `Name`.	2019-01-04 10:38:10 +01:00
Jonas Jenwald	6cd9ff48f3	Prevent errors, because of incorrect scope, in the `XMLParserBase._resolveEntities` method (issue 10407)	2019-01-04 10:13:32 +01:00
Tim van der Meij	2d00bb098b	Merge pull request #10404 from Snuffleupagus/issue-10401 Remove the `for ... of` loop from the `PDFDocument.fingerprint` getter (issue 10401)	2019-01-03 22:46:51 +01:00
Brendan Dahl	e2686db49b	Merge pull request #10277 from janpe2/cff-stems Repair CFF fonts if stem hints are in wrong order	2019-01-03 10:30:43 -08:00
Jonas Jenwald	8c278530dd	Remove the `for ... of` loop from the `PDFDocument.fingerprint` getter (issue 10401) It appears that the `Symbol` polyfill doesn't work well in conjunction with `TypedArray`s, and that part of PR 10393 is thus reverted.	2019-01-03 11:17:45 +01:00
Tim van der Meij	1b84b2ed60	Merge pull request #10398 from Snuffleupagus/issue-10395 Prevent errors in various methods in `SimpleDOMNode` when the `childNodes` property is not defined (issue 10395)	2019-01-01 16:22:11 +01:00
Jonas Jenwald	d371d23382	Prevent errors in various methods in `SimpleDOMNode` when the `childNodes` property is not defined (issue 10395) Given that the issue, as filed, is incomplete since no PDF file was provided for debugging, this patch is really the best that we can do here. Please note: This patch will not enable the Metadata to be successfully parsed, but it should at least prevent the errors.	2018-12-31 13:07:15 +01:00
Tim van der Meij	d8f201ea2a	Merge pull request #10397 from Snuffleupagus/issue-10385 Ensure that `AnnotationBorderStyle.setWidth` is able to handle the input being a `Name`, to correctly deal with corrupt PDF documents (issue 10385)	2018-12-31 12:58:28 +01:00
Jonas Jenwald	76a9580aeb	Ensure that `AnnotationBorderStyle.setWidth` is able to handle the input being a `Name`, to correctly deal with corrupt PDF documents (issue 10385)	2018-12-31 12:21:28 +01:00
Jonas Jenwald	15b3806937	Actually validate the input in `AnnotationBorderStyle.setStyle`	2018-12-31 12:15:15 +01:00
Tim van der Meij	5b57e69da2	Optimize `CanvasGraphics.setFont` to avoid intermediate string creation This method creates quite a few intermediate strings on each call and it's called often, even for smaller documents like the Tracemonkey document. Scrolling from top to bottom in that document resulted in 14126 strings being created in this method. With this commit applied, this is reduced to 2018 strings.	2018-12-30 14:58:32 +01:00
Tim van der Meij	95f9075565	Optimize `TextLayerRenderTask._layoutText` to avoid intermediate string creation This method creates quite a few intermediate strings on each call and it's called often, even for smaller documents like the Tracemonkey document. Scrolling from top to bottom in that document resulted in 12936 strings being created in this method. With this commit applied, this is reduced to 3610 strings.	2018-12-30 14:39:08 +01:00
Tim van der Meij	d5e5d18430	Convert the `PDFDocument` class in `src/core/document.js` to ES6 syntax	2018-12-30 13:54:43 +01:00
Tim van der Meij	612fc9fcc2	Convert the `Page` class in `src/core/document.js` to ES6 syntax	2018-12-30 13:54:43 +01:00
Tim van der Meij	aad27ff9a0	Optimize the `Ref` class in `src/core/primitives.js` The `toString` method always creates two string objects (for the 'R' character and for the `num` concatenation) and in the worst case creates three string objects (one more for the `gen` concatenation). For the Tracemonkey paper alone, this resulted in 12000 string objects when scrolling from the top to the bottom of the document. Since this is a hot function, it's worth minimizing the number of string objects, especially for large documents, to reduce peak memory usage. This commit refactors the `toString` method to always create only one string object.	2018-12-29 17:48:41 +01:00
Jonas Jenwald	60bcce184e	Check that the first page can be successfully loaded, to try and ascertain the validity of the XRef table (issue 7496, issue 10326) For PDF documents with sufficiently broken XRef tables, it's usually quite obvious when you need to fallback to indexing the entire file. However, for certain kinds of corrupted PDF documents the XRef table will, for all intents and purposes, appear to be valid. It's not until you actually try to fetch various objects that things will start to break, which is the case in the referenced issues[1]. Since there's generally a real effort being in made PDF.js to load even corrupt PDF documents, this patch contains a suggested approach to attempt to do a bit more validation of the XRef table during the initial document loading phase. Here the choice is made to attempt to load the first page, as a basic sanity check of the validity of the XRef table. Please note that attempting to load a more-or-less arbitrarily chosen object without any context of what it's supposed to be isn't a very useful, which is why this particular choice was made. Obviously, just because the first page can be loaded successfully that doesn't guarantee that the entire XRef table is valid, however if even the first page fails to load you can be reasonably sure that the document is not valid[2]. Even though this patch won't cause any significant increase in the amount of parsing required during initial loading of the document[3], it will require loading of more data upfront which thus delays the initial `getDocument` call. Whether or not this is a problem depends very much on what you actually measure, please consider the following examples: ```javascript console.time('first'); getDocument(...).promise.then((pdfDocument) => { console.timeEnd('first'); }); console.time('second'); getDocument(...).promise.then((pdfDocument) => { pdfDocument.getPage(1).then((pdfPage) => { // Note: the API uses `pageNumber >= 1`, the Worker uses `pageIndex >= 0`. console.timeEnd('second'); }); }); ``` The first case is pretty much guaranteed to show a small regression, however the second case won't be affected at all since the Worker caches the result of `getPage` calls. Again, please remember that the second case is what matters for the standard PDF.js use-case which is why I'm hoping that this patch is deemed acceptable. --- [1] In issue 7496, the problem is that the document is edited without the XRef table being correctly updated. In issue 10326, the generator was sorting the XRef table according to the offsets rather than the objects. [2] The idea of checking the first page in particular came from the "standard" use-case for the PDF.js library, i.e. the default viewer, where a failure to load the first page basically means that nothing will work; note how `{BaseViewer, PDFThumbnailViewer}.setDocument` depends completely on being able to fetch the first page. [3] The only extra parsing is caused by, potentially, having to traverse part of the `Pages` tree to find the first page.	2018-12-29 12:47:25 +01:00
Tim van der Meij	360c3d3813	Remove the unused `url` argument for the `ChunkedStreamManager` class	2018-12-24 13:14:42 +01:00
Tim van der Meij	47344197f4	Convert `src/core/chunked_stream.js` to ES6 syntax	2018-12-24 13:14:42 +01:00
Tim van der Meij	103f4616ac	Merge pull request #10334 from Snuffleupagus/OpenAction-dest [api-minor] Add support for OpenAction destinations (issue 10332)	2018-12-23 20:49:50 +01:00
Jonas Jenwald	f0719ed565	[api-minor] Change the `getViewport` method, on `PDFPageProxy`, to take a parameter object rather than a bunch of (randomly) ordered parameters If, as PR 10368 suggests, more parameters should be added to `getViewport` I think that it would be a mistake to not change the signature first to avoid needlessly unwieldy call-sites. To not break any existing code and third-party use-cases, this is obviously implemented with a deprecation warning and with a working fallback[1] for the old method signature. --- [1] This is limited to `GENERIC` builds, which should be sufficient.	2018-12-21 11:55:20 +01:00
Jonas Jenwald	b05f053287	[api-minor] Add support for OpenAction destinations (issue 10332) Note that the OpenAction dictionary may contain other information besides just a destination array, e.g. instructions for auto-printing[1]. Given first of all that an arbitrary `Dict` cannot be sent from the Worker (since cloning would fail), and second of all that the data obviously needs to be validated, this patch purposely only adds support for fetching a destination from the OpenAction entry[2]. --- [1] This information is, currently in PDF.js, being included through the `getJavaScript` API method. [2] This significantly reduces the complexity of the implementation, which seems fine for now. If there's ever need for other kinds of OpenAction to be fetched, additional API methods could/should be implemented as necessary (could e.g. follow the `getOpenActionWhatever` naming scheme).	2018-12-19 11:45:16 +01:00
Jonas Jenwald	ba2edeae18	[api-minor] Add support, in `getMetadata`, for custom information dictionary entries (issue 5970, issue 10344) (#10346 ) The custom entries, provided that they exist and that their types are safe to include, are exposed through a new `Custom` infoDict entry to clearly separate them from the standard ones. Fixes 5970. Fixes 10344.	2018-12-18 23:26:02 +01:00
Jonas Jenwald	437fb8a8a7	Ignore the `fieldValue` for Signature annotations, since they're currently unsupported (issue 10374) Given that Signature (Widget) annotations are currently not supported, since they cannot be validated, simply ignoring the `fieldValue` seems OK for now considering that attempting to blindly include unparsed/unvalidated data isn't very useful. Fixes 10347.	2018-12-12 18:01:43 +01:00
Tim van der Meij	45c0197465	Merge pull request #10330 from janpe2/svg-line-width-zero Handle line width of zero in SVG	2018-12-07 23:34:27 +01:00

1 2 3 4 5 ...

3460 Commits