pdf.js

Author	SHA1	Message	Date
Jonas Jenwald	b531fc4106	Avoid truncating inline images, where the data and the "EI" marker is glued together (issue 10388) (#10436 ) Thanks to the excellent debugging done by @janpe2, this was easy to fix!	2019-01-12 20:31:23 +01:00
Tim van der Meij	eb7cd884ed	Merge pull request #10441 from Snuffleupagus/indexObjects-more-nested-obj Handle more cases of corrupt PDF files with missing 'endobj' operators, where the "obj" string is immediately followed by the dictionary (PR 9288 follow-up)	2019-01-12 20:02:09 +01:00
Jonas Jenwald	d4a3858ed5	Handle more cases of corrupt PDF files with missing 'endobj' operators, where the "obj" string is immediately followed by the dictionary (PR 9288 follow-up)	2019-01-10 17:55:28 +01:00
Tim van der Meij	e4d2a1604e	Merge pull request #10423 from Snuffleupagus/historyUpdateUrl Add support for updating the document hash, off by default, when the browser history is updated (issue 5753)	2019-01-06 20:18:12 +01:00
Jonas Jenwald	358cd0c096	Add a few more `String` polyfills (startsWith, endsWith, padStart, padEnd)	2019-01-06 20:10:55 +01:00
Jonas Jenwald	4773bf6fcb	Add support for updating the document hash, off by default, when the browser history is updated (issue 5753) This is really the best that we can do here, since other proposed solutions would interfere with (and break) the painstakingly implemented browsing history that's present in the default viewer. I'm still not convinced that this is a good idea in general, but this patch implements it in a way where it is possible to toggle[1] for users that wish to have this feature. In particular, there's a couple of reasons why I'm not finding this feature necessary/great: - It's already possible to easily obtain the current hash, by simply clicking on the `viewBookmark` button at any time. - Hash changes requires a bit of special handling[2], i.e. extra code, to prevent issues when the browser history is traversed (see `PDFHistory._popState`). Currently this is only necessary when the user has manually changed the hash, with this patch it will always be the case (assuming the feature is active). - It's not always possible to change the URL when updating the browser history. For example: In the Firefox built-in viewer, the URL cannot be modified for local files (i.e. those using the `file://` protocol). This leads to inconsistent behaviour, and may in some cases even result in errors being thrown and the history thus not updating, if the browser prevents changes to the URL during `pushState`/`replaceState` calls. --- [1] Using the `historyUpdateUrl` viewer preference. [2] This depends, to a great extent, on browsers always firing `popstate` events before `hashchange` events, which may or may not actually be guaranteed.	2019-01-06 20:09:02 +01:00
Tim van der Meij	af31b980b0	Merge pull request #10424 from Snuffleupagus/issue-6847 Accept non-matching document fingerprints, in `PDFHistory`, when the viewer is reloaded (issue 6847)	2019-01-06 19:08:03 +01:00
Jonas Jenwald	d46715210a	Accept non-matching document fingerprints, in `PDFHistory`, when the viewer is reloaded (issue 6847) This should hopefully be sufficient to address issue 6847, and given the limited impact of the code changes I'm not completely sure if this would need to be controlled by a preference!? Initially my intention was to try and provide some (slightly more detailed) implementation suggestions in the issue, but having looked briefly at doing that it would essentially have amounted to actually writing the code anyway. (Especially considering that the recent questions seemed to more-or-less ignore the information already provided in the first post.) Finally, note that since `performance.navigation.type` is marked as deprecated, a slightly different approach was choosen instead.	2019-01-06 17:02:39 +01:00
Tim van der Meij	968a153180	Merge pull request #10422 from timvandermeij/es6 Convert more files in `src/core` to ES6 syntax	2019-01-06 15:06:57 +01:00
Tim van der Meij	f162fed6b9	Convert `src/core/charsets.js` and `src/core/standard_fonts.js` to ES6 syntax Moreover, include the "no var" ESLint comment to `src/core/annotation.js` and `src/core/ps_parser.js` since they are already converted.	2019-01-06 15:04:01 +01:00
Tim van der Meij	3b637e71d4	Convert `src/core/arithmetic_decoder.js` to ES6 syntax	2019-01-06 15:04:01 +01:00
Tim van der Meij	c967eab8b1	Merge pull request #10421 from timvandermeij/fix Switch to HTTPS for the license link on the website	2019-01-05 15:36:39 +01:00
Tim van der Meij	be3defdd94	Switch to HTTPS for the license link on the website Moreover, fix a small oversight in how the file tree is rendered.	2019-01-05 15:35:17 +01:00
Tim van der Meij	7307c60407	Merge pull request #10420 from timvandermeij/misc Update translations/packages and improve documentation	2019-01-05 15:27:06 +01:00
Tim van der Meij	61dcc41a3c	Clarify that `gulp dist-install` should be used for the AcroForms example Fixes #10333.	2019-01-05 15:20:50 +01:00
Tim van der Meij	f32dcbc089	Improve the file layout overview on the website Mention only the relevant files/folders and update the overview to match the current file trees. Fixes #10384.	2019-01-05 15:20:50 +01:00
Tim van der Meij	ca04a397bb	Update packages	2019-01-05 14:27:47 +01:00
Tim van der Meij	825aceb648	Update translations	2019-01-05 14:24:44 +01:00
Tim van der Meij	b81984f0cb	Merge pull request #10417 from brendandahl/metric-length Fix reading number of HTMX metrics.	2019-01-05 13:35:16 +01:00
Tim van der Meij	a0eb5cf9d5	Merge pull request #10412 from Snuffleupagus/issue-10410 Prevent errors, in `SimpleXMLParser.onEndElement`, when the stack has already been completely parsed (issue 10410)	2019-01-05 12:56:43 +01:00
Jonas Jenwald	e8f4b47d59	Prevent errors, in `SimpleXMLParser.onEndElement`, when the stack has already been completely parsed (issue 10410) The error was triggered for a particular set of metadata, where an end tag was encountered without the corresponding begin tag being present in the data. (The patch also fixes a minor oversight, from a recent PR, in the `SimpleDOMNode.nextSibling` method.)	2019-01-05 11:15:34 +01:00
Brendan Dahl	32eace043b	Fix reading number of HTMX metrics. The length of the HHEA table can be incorrect, so it is better to read the number of metrics offset from beginning of table instead.	2019-01-04 15:13:13 -08:00
Tim van der Meij	b39ec7af96	Merge pull request #10408 from Snuffleupagus/issue-10407 Prevent errors, because of incorrect scope, in the `XMLParserBase._resolveEntities` method (issue 10407)	2019-01-04 23:45:26 +01:00
Tim van der Meij	3f9e9f0d88	Merge pull request #10411 from Snuffleupagus/issue-10385-2 Adjust how `AnnotationBorderStyle.setWidth` handles the input being a `Name` (issue 10385)	2019-01-04 23:42:08 +01:00
Jonas Jenwald	66fccd860b	Adjust how `AnnotationBorderStyle.setWidth` handles the input being a `Name` (issue 10385) In order to be consistent with the behaviour in Adobe Reader, the width will now always be set to zero when the input is a `Name`.	2019-01-04 10:38:10 +01:00
Jonas Jenwald	6cd9ff48f3	Prevent errors, because of incorrect scope, in the `XMLParserBase._resolveEntities` method (issue 10407)	2019-01-04 10:13:32 +01:00
Tim van der Meij	5a2bd9fc63	Merge pull request #10399 from MohammedEssehemy/master migrate to canvas 2.x api	2019-01-03 23:32:56 +01:00
Tim van der Meij	2d00bb098b	Merge pull request #10404 from Snuffleupagus/issue-10401 Remove the `for ... of` loop from the `PDFDocument.fingerprint` getter (issue 10401)	2019-01-03 22:46:51 +01:00
Brendan Dahl	e2686db49b	Merge pull request #10277 from janpe2/cff-stems Repair CFF fonts if stem hints are in wrong order	2019-01-03 10:30:43 -08:00
Jonas Jenwald	8c278530dd	Remove the `for ... of` loop from the `PDFDocument.fingerprint` getter (issue 10401) It appears that the `Symbol` polyfill doesn't work well in conjunction with `TypedArray`s, and that part of PR 10393 is thus reverted.	2019-01-03 11:17:45 +01:00
Mohammed Essehemy	f0e9df745c	migrate to canvas 2.x api	2019-01-02 01:10:07 +02:00
Tim van der Meij	1b84b2ed60	Merge pull request #10398 from Snuffleupagus/issue-10395 Prevent errors in various methods in `SimpleDOMNode` when the `childNodes` property is not defined (issue 10395)	2019-01-01 16:22:11 +01:00
Jonas Jenwald	d371d23382	Prevent errors in various methods in `SimpleDOMNode` when the `childNodes` property is not defined (issue 10395) Given that the issue, as filed, is incomplete since no PDF file was provided for debugging, this patch is really the best that we can do here. Please note: This patch will not enable the Metadata to be successfully parsed, but it should at least prevent the errors.	2018-12-31 13:07:15 +01:00
Tim van der Meij	d8f201ea2a	Merge pull request #10397 from Snuffleupagus/issue-10385 Ensure that `AnnotationBorderStyle.setWidth` is able to handle the input being a `Name`, to correctly deal with corrupt PDF documents (issue 10385)	2018-12-31 12:58:28 +01:00
Tim van der Meij	2cdeb93b5f	Merge pull request #10396 from timvandermeij/optimizations Optimizations to avoid intermediate string creation	2018-12-31 12:50:00 +01:00
Jonas Jenwald	76a9580aeb	Ensure that `AnnotationBorderStyle.setWidth` is able to handle the input being a `Name`, to correctly deal with corrupt PDF documents (issue 10385)	2018-12-31 12:21:28 +01:00
Jonas Jenwald	15b3806937	Actually validate the input in `AnnotationBorderStyle.setStyle`	2018-12-31 12:15:15 +01:00
Tim van der Meij	5b57e69da2	Optimize `CanvasGraphics.setFont` to avoid intermediate string creation This method creates quite a few intermediate strings on each call and it's called often, even for smaller documents like the Tracemonkey document. Scrolling from top to bottom in that document resulted in 14126 strings being created in this method. With this commit applied, this is reduced to 2018 strings.	2018-12-30 14:58:32 +01:00
Tim van der Meij	95f9075565	Optimize `TextLayerRenderTask._layoutText` to avoid intermediate string creation This method creates quite a few intermediate strings on each call and it's called often, even for smaller documents like the Tracemonkey document. Scrolling from top to bottom in that document resulted in 12936 strings being created in this method. With this commit applied, this is reduced to 3610 strings.	2018-12-30 14:39:08 +01:00
Tim van der Meij	7c080584b6	Merge pull request #10393 from timvandermeij/document Convert `src/core/document.js` to ES6 syntax	2018-12-30 14:23:38 +01:00
Tim van der Meij	d5e5d18430	Convert the `PDFDocument` class in `src/core/document.js` to ES6 syntax	2018-12-30 13:54:43 +01:00
Tim van der Meij	612fc9fcc2	Convert the `Page` class in `src/core/document.js` to ES6 syntax	2018-12-30 13:54:43 +01:00
Tim van der Meij	85363f4566	Merge pull request #10394 from timvandermeij/primitives-optimization Optimize the `Ref` class in `src/core/primitives.js`	2018-12-30 12:30:09 +01:00
Tim van der Meij	aad27ff9a0	Optimize the `Ref` class in `src/core/primitives.js` The `toString` method always creates two string objects (for the 'R' character and for the `num` concatenation) and in the worst case creates three string objects (one more for the `gen` concatenation). For the Tracemonkey paper alone, this resulted in 12000 string objects when scrolling from the top to the bottom of the document. Since this is a hot function, it's worth minimizing the number of string objects, especially for large documents, to reduce peak memory usage. This commit refactors the `toString` method to always create only one string object.	2018-12-29 17:48:41 +01:00
Tim van der Meij	e53877f372	Merge pull request #10392 from Snuffleupagus/checkFirstPage Check that the first page can be successfully loaded, to try and ascertain the validity of the XRef table (issue 7496, issue 10326)	2018-12-29 15:13:19 +01:00
Jonas Jenwald	60bcce184e	Check that the first page can be successfully loaded, to try and ascertain the validity of the XRef table (issue 7496, issue 10326) For PDF documents with sufficiently broken XRef tables, it's usually quite obvious when you need to fallback to indexing the entire file. However, for certain kinds of corrupted PDF documents the XRef table will, for all intents and purposes, appear to be valid. It's not until you actually try to fetch various objects that things will start to break, which is the case in the referenced issues[1]. Since there's generally a real effort being in made PDF.js to load even corrupt PDF documents, this patch contains a suggested approach to attempt to do a bit more validation of the XRef table during the initial document loading phase. Here the choice is made to attempt to load the first page, as a basic sanity check of the validity of the XRef table. Please note that attempting to load a more-or-less arbitrarily chosen object without any context of what it's supposed to be isn't a very useful, which is why this particular choice was made. Obviously, just because the first page can be loaded successfully that doesn't guarantee that the entire XRef table is valid, however if even the first page fails to load you can be reasonably sure that the document is not valid[2]. Even though this patch won't cause any significant increase in the amount of parsing required during initial loading of the document[3], it will require loading of more data upfront which thus delays the initial `getDocument` call. Whether or not this is a problem depends very much on what you actually measure, please consider the following examples: ```javascript console.time('first'); getDocument(...).promise.then((pdfDocument) => { console.timeEnd('first'); }); console.time('second'); getDocument(...).promise.then((pdfDocument) => { pdfDocument.getPage(1).then((pdfPage) => { // Note: the API uses `pageNumber >= 1`, the Worker uses `pageIndex >= 0`. console.timeEnd('second'); }); }); ``` The first case is pretty much guaranteed to show a small regression, however the second case won't be affected at all since the Worker caches the result of `getPage` calls. Again, please remember that the second case is what matters for the standard PDF.js use-case which is why I'm hoping that this patch is deemed acceptable. --- [1] In issue 7496, the problem is that the document is edited without the XRef table being correctly updated. In issue 10326, the generator was sorting the XRef table according to the offsets rather than the objects. [2] The idea of checking the first page in particular came from the "standard" use-case for the PDF.js library, i.e. the default viewer, where a failure to load the first page basically means that nothing will work; note how `{BaseViewer, PDFThumbnailViewer}.setDocument` depends completely on being able to fetch the first page. [3] The only extra parsing is caused by, potentially, having to traverse part of the `Pages` tree to find the first page.	2018-12-29 12:47:25 +01:00
Tim van der Meij	d3868e1bd1	Merge pull request #10376 from timvandermeij/chunked-stream Convert `src/core/chunked_stream.js` to ES6 syntax	2018-12-25 15:28:00 +01:00
Tim van der Meij	360c3d3813	Remove the unused `url` argument for the `ChunkedStreamManager` class	2018-12-24 13:14:42 +01:00
Tim van der Meij	47344197f4	Convert `src/core/chunked_stream.js` to ES6 syntax	2018-12-24 13:14:42 +01:00
Tim van der Meij	2e05827b87	Merge pull request #10378 from Snuffleupagus/issue-10377 Update remaining examples, and docs, to utilize current API functionality (issue 10377)	2018-12-24 13:13:10 +01:00

... 2 3 4 5 6 ...

11496 Commits