pdf.js

Author	SHA1	Message	Date
Thomas den Hollander	b24a14738a	Update test case description	2019-03-20 12:52:32 +01:00
Tim van der Meij	33bfbef6ba	Merge pull request #10635 from timvandermeij/lexer-parser Convert `src/core/parser.js` to ES6 syntax and write more unit tests for the lexer and the parser	2019-03-19 23:17:34 +01:00
Tim van der Meij	4a4b197b9d	Write more unit tests for the lexer and the parser Moreover, group the lexer unit tests per method. This matches what we do for other classes and makes it more easily visible which methods we don't or insufficiently unit test. The parser itself is not unit tested yet, so this patch provides a start for doing so. The `inlineStreamSkipEI` method is used in other end marker detection methods, so it's important that its functionality is correct for proper parsing.	2019-03-17 13:36:23 +01:00
Tim van der Meij	2ee299a62b	Convert `test/unit/parser_spec.js` to ES6 syntax Moreover, disable `var` usage for this file.	2019-03-17 13:27:46 +01:00
Tim van der Meij	80135378ca	Merge pull request #10636 from Snuffleupagus/PDFDocumentProxy-destroy Small clean-up of the `PDFDocumentProxy.destroy` method and related code	2019-03-13 23:46:41 +01:00
Jonas Jenwald	24fc4f83ca	Small clean-up of the `PDFDocumentProxy.destroy` method and related code Note how `PDFDocumentProxy.destroy` is a nothing more than an alias for `PDFDocumentLoadingTask.destroy`. While removing the latter method would be a breaking API change, there's still room for at least some clean-up here. The main changes in this patch are: - Stop providing a `PDFDocumentLoadingTask` instance separately when creating a `PDFDocumentProxy`, since the loadingTask is already available through the `WorkerTransport` instance. - Stop tracking the `PDFDocumentProxy` instance on the `WorkerTransport`, since that property is completely unused. - Simplify the 'Multiple `getDocument` instances' unit-tests by only destroying once, rather than twice, for each document.	2019-03-12 13:25:29 +01:00
Jonas Jenwald	88f9e633dd	Try to improve text-selection for Type3 fonts that utilize a non-default /FontMatrix (bug 1513120) For Type3 fonts text-selection is often not that great, and there's a couple of heuristics used to try and improve things. This patch simple extends those heuristics a bit, and fixes a pre-existing "naive" array comparison, but this all feels a bit brittle to say the least. The existing Type3 test-coverage isn't that great in general, and in particular Type3 `text` tests are few and far between, hence why this patch adds two different new `text` tests.	2019-03-12 10:32:08 +01:00
Tim van der Meij	8b149b818e	Merge pull request #10615 from Snuffleupagus/corrupt-inline-ASCII85Decode Handle corrupt ASCII85Decode inline images with whitespace "inside" of the EOD marker (issue 10614)	2019-03-08 23:06:01 +01:00
Tim van der Meij	b244622f7e	Improve unit test coverage for `src/display/display_utils.js` The `DOMCanvasFactory` class is now fully covered. Moreover, missing cases for the `getFilenameFromUrl` function have been included. Finally, `var` usage has been removed.	2019-03-06 23:41:54 +01:00
Jonas Jenwald	3ce8fe7927	Handle corrupt ASCII85Decode inline images with whitespace "inside" of the EOD marker (issue 10614) There's a number of things wrong with the PDF document, since its inline images are first all a lot larger than the 4 KB limit (as mandated by the specification, see https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G7.1852045). Furthermore the actual ASCII85Decode data is interspersed with a lot of needless whitespace, in particular also "inside" of the EOD (end-of-data) marker which thus completely breaks the detection. Note that according to the specification, see https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G6.1940130, this patch should be safe since it explicitly mentions that all whitespace should be ignored.	2019-03-04 23:41:36 +01:00
Brendan Dahl	34022d2fd1	Merge pull request #10591 from brendandahl/fix-charset Add unique glyph names for CFF fonts.	2019-02-28 17:22:29 -08:00
Tim van der Meij	af5597b7e5	Merge pull request #10573 from Snuffleupagus/type3-avoid-truncation Avoid truncating/breaking some Type3 glyphs in `compileType3Glyph` (bug 1245391, issue 10568)	2019-02-28 23:25:45 +01:00
Brendan Dahl	8a596ef5d5	Add unique glyph names for CFF fonts. Printing on MacOS was broken with the previous approach of just mapping all the glyphs to notdef.	2019-02-27 15:00:29 -08:00
Jonas Jenwald	f664e074c9	Avoid using the Fetch API, in `GENERIC` builds, for unsupported protocols (issue 10587)	2019-02-27 13:04:20 +01:00
Jonas Jenwald	cbc07f985b	Load built-in CMap files using the Fetch API when possible	2019-02-27 13:04:19 +01:00
Jonas Jenwald	c5cf3ab808	Run the `custom_spec` unit-tests in Node.js/Travis (PR 10537 follow-up)	2019-02-26 22:40:55 +01:00
Jonas Jenwald	db5dc14158	Move worker-thread only functions from `src/shared/util.js` and into a new `src/core/core_utils.js` file The `src/shared/util.js` file is being bundled into both the `pdf.js` and `pdf.worker.js` files, meaning that its code is by definition duplicated. Some main-thread only utility functions have already been moved to a separate `src/display/display_utils.js` file, and this patch simply extends that concept to utility functions which are used only on the worker-thread. Note in particular the `getInheritableProperty` function, which expects a `Dict` as input and thus cannot possibly ever be used on the main-thread.	2019-02-24 00:35:39 +01:00
Jonas Jenwald	a1f7517996	Rename the `src/display/dom_utils.js` file to `src/display/display_utils.js` This file (currently) contains not only DOM-specific helper functions/classes, but is used generally for various helper code relevant for main-thread functionality.	2019-02-23 16:30:16 +01:00
Jonas Jenwald	fb774a65b0	Avoid truncating/breaking some Type3 glyphs in `compileType3Glyph` (bug 1245391, issue 10568) Hopefully this patch makes sense, since I cannot claim to fully understand this function. With the changes made in PR 3354 some Type3 glyph outlines are no longer rendering correctly, since the final paths were being accidentally ignored. The fact that Type3 fonts are not very common in PDF documents, and that most Type3 glyphs are unaffected by this regression, probably explains why this has gone unnoticed since 2013.	2019-02-21 23:29:43 +01:00
Jonas Jenwald	a0354494bd	Re-factor the `PDFDataRangeTransport` unit-tests and enable them in Node.js/Travis There doesn't appear to be any particular reason for only running these unit-tests in browsers, since the `PDFDataRangeTransport` functionality itself should be back-end agnostic.	2019-02-17 14:45:17 +01:00
Jonas Jenwald	507e0a4907	Add a new `DOMFileReaderFactory` helper to the unit-tests, and re-factor `NodeFileReaderFactory` to be asynchronous This allows simplification of the 'creates pdf doc from URL and aborts loading after worker initialized' API unit-test. Note that the `DOMFileReaderFactory` uses the Fetch API, for simplicity, since it should be available in all browsers where we're running tests.	2019-02-17 14:41:14 +01:00
Jonas Jenwald	60f6d49ff7	[api-minor] Expose the existence of a `Collection` dictionary via the `getMetadata` API method (issue 10555) Given the complexity of this functionality, and the fact that it doesn't seem widely used, I highly doubt that it'd ever make sense to support Collections; see also https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#M11.9.39646.2Heading.824.Collections	2019-02-15 15:40:31 +01:00
Tim van der Meij	1d90c76097	Merge pull request #10537 from timvandermeij/unittest Improve unit test coverage	2019-02-12 00:12:29 +01:00
Tim van der Meij	7c91e94b19	Implement the `NodeCanvasFactory` class to execute more unit tests in Node.js	2019-02-10 19:37:34 +01:00
Tim van der Meij	b6eddc40b5	Write unit tests for the `string32` and `toRomanNumerals` utility functions	2019-02-10 18:58:52 +01:00
Tsukasa OI	96ba6afd47	Fix copying on supplementary plane characters pdf.js had a problem when copying characters on supplementary planes (0xPPXXXX where PP is nonzero). This is because certain methods of PartialEvaluator use classic String.fromCharCode instead of ES6's String.fromCodePoint. Despite the fact that readToUnicode method tried to parse out-of-UCS2 code points by parsing UTF-16BE, it was inadequate because String.fromCharCode only supports UCS-2 range of Unicode.	2019-02-10 18:14:53 +09:00
Jonas Jenwald	22468817e1	Add a `settled` property, tracking the fulfilled/rejected stated of the Promise, to `createPromiseCapability` This allows cleaning-up code which is currently manually tracking the state of the Promise of a `createPromiseCapability` instance.	2019-02-02 15:18:56 +01:00
Jonas Jenwald	2b0b6178f7	Clean-up after the `gets operatorList with JPEG image (issue 4888)` unit-test This unit-test wasn't destroying the `loadingTask` when complete, as it should have done.	2019-01-29 15:24:08 +01:00
Jonas Jenwald	6f94a05a29	Do the final text scaling correctly in `flushTextContentItem` (issue 8276) It's necessary to take into account whether or not the text is vertical, to avoid either the textContent `width` or `height` becoming incorrect.	2019-01-29 15:24:04 +01:00
Tim van der Meij	e2701d5422	Merge pull request #10482 from janpe2/indexed-decode Implement Decode entry in Indexed images	2019-01-24 23:46:55 +01:00
Jonas Jenwald	41fbc71ef9	Ensure that `XRef.indexObjects` can handle object numbers with zero-padding (issue 10491) All objects in the PDF document follow this pattern: ``` 0000000001 0 obj << % Some content here... >> endobj 0000000002 0 obj << % More content here... endobj ```	2019-01-24 22:37:18 +01:00
Jani Pehkonen	26121177ab	Implement Decode entry in Indexed images	2019-01-22 22:51:04 +02:00
Tim van der Meij	c4fe4087d3	Implement a unit test for metadata parsing to ensure that it's not vulnerable to the billion laughs attack	2019-01-19 19:54:08 +01:00
Jonas Jenwald	24a688d6c6	Convert some usage of `indexOf` to `startsWith`/`includes` where applicable In many cases in the code you don't actually care about the index itself, but rather just want to know if something exists in a String/Array or if a String starts in a particular way. With modern JavaScript functionality, it's thus possible to remove a number of existing `indexOf` cases.	2019-01-18 17:57:41 +01:00
Jonas Jenwald	9f45f8dfda	When parsing Metadata, attempt to remove "junk" before the first tag (PR 10398 follow-up) This will allow the Metadata to be successfully extracted from the PDF file in issue 10395. Furthermore, this patch also fixes a bug in `Metadata.get` which causes the method to return `null` rather than an empty string or zero (since either ought to be allowed).	2019-01-16 12:44:27 +01:00
Jonas Jenwald	5d90224409	Add a unit-test for issue 10395 (PR 10398 follow-up)	2019-01-16 11:30:36 +01:00
Tim van der Meij	5cb00b7967	Merge pull request #10443 from Snuffleupagus/getVisibleElements-fixes Prevent `TypeError: views[index] is undefined` being throw in `getVisibleElements` when the viewer, or all pages, are hidden	2019-01-13 15:41:48 +01:00
Tim van der Meij	6279fc601a	Handle malformed URIs as bad requests in the development webserver Fixes #10445 (found by Dhiraj Mishra).	2019-01-13 14:57:20 +01:00
Jonas Jenwald	b2235ec9c4	Add a unit-test to check that the `sortByVisibility` parameter, in `getVisibleElements`, works correctly	2019-01-13 11:34:38 +01:00
Jonas Jenwald	9743708a24	Prevent `TypeError: views[index] is undefined` being throw in `getVisibleElements` when the viewer, or all pages, are hidden Previously a couple of different attempts at fixing this problem has been rejected, given how crucial this code is for the correct function of the viewer, since no one has thus far provided any evidence that the problem actually affects the default viewer[1] nor an example using the viewer components directly (without another library on top). The fact that none of the prior patches contained even a simple unit-test probably contributed to the unwillingness of a reviewer to sign off on the suggested changes. However, it turns out that it's possible to create a reduced test-case, using the default viewer, that demonstrates the error[2]. Since this utilizes a hidden `<iframe>`, please note that this error will thus affect Firefox as well. Note that while errors are thrown when the hidden `<iframe>` loads, the default viewer doesn't break completely since rendering does start working once the `<iframe>` becomes visible (although the errors do break the initial Toolbar state). Before making any changes here, I carefully read through not just the immediately relevant code but also the rendering code in the viewer (given it's dependence on `getVisibleElements`). After concluding that the changes should be safe in general, the default viewer was tested without any issues found. (The above being much easier with significant prior experience of working with the viewer code.) Finally the patch also adds new unit-tests, one of which explicitly triggers the relevant code-path and will thus fail with the current `master` branch. This patch also makes `PDFViewerApplication` slightly more robust against errors during document opening, to ensure that viewer/document initialization always completes as expected. Please keep in mind that even though this patch prevents an error in `getVisibleElements`, it's still not possible to set the initial position/zoom level/sidebar view etc. when the viewer is hidden since rendering and scrolling is completely dependent[3] on being able to actually access the DOM elements. --- [1] And hence the PDF Viewer that's built-in to Firefox. [2] Copy the HTML code below and save it as `iframe.html`, and place the file in the `web/` folder. Then start the server, with `gulp server`, and navigate to http://localhost:8888/web/iframe.html ```html <!DOCTYPE html> <html> <head> <title>Iframe test</title> <script> window.onload = function() { const button = document.getElementById('button1'); const frame = document.getElementById('frame1'); button.addEventListener('click', function(evt) { frame.hidden = !frame.hidden; }); }; </script> </head> <body> <button id="button1">Toggle iframe</button> <br> <iframe id="frame1" width="800" height="600" src="http://localhost:8888/web/viewer.html" hidden="true"></iframe> </body> </html> ``` [3] This is an old, pre-exisiting, issue that's not relevant to this patch as such (and it's already being tracked elsewhere).	2019-01-13 11:34:24 +01:00
Tim van der Meij	ed918bad21	Remove left-over console log from the find controller unit tests	2019-01-12 22:27:40 +01:00
Tim van der Meij	a37ea16013	Merge pull request #10425 from timvandermeij/find-controller-unit-tests Write more unit tests for the find controller	2019-01-12 22:20:53 +01:00
Tim van der Meij	b1cef896f4	Write more unit tests for the find controller Fixes #7356.	2019-01-12 22:17:46 +01:00
Jonas Jenwald	b531fc4106	Avoid truncating inline images, where the data and the "EI" marker is glued together (issue 10388) (#10436 ) Thanks to the excellent debugging done by @janpe2, this was easy to fix!	2019-01-12 20:31:23 +01:00
Jonas Jenwald	d4a3858ed5	Handle more cases of corrupt PDF files with missing 'endobj' operators, where the "obj" string is immediately followed by the dictionary (PR 9288 follow-up)	2019-01-10 17:55:28 +01:00
Tim van der Meij	b81984f0cb	Merge pull request #10417 from brendandahl/metric-length Fix reading number of HTMX metrics.	2019-01-05 13:35:16 +01:00
Jonas Jenwald	e8f4b47d59	Prevent errors, in `SimpleXMLParser.onEndElement`, when the stack has already been completely parsed (issue 10410) The error was triggered for a particular set of metadata, where an end tag was encountered without the corresponding begin tag being present in the data. (The patch also fixes a minor oversight, from a recent PR, in the `SimpleDOMNode.nextSibling` method.)	2019-01-05 11:15:34 +01:00
Brendan Dahl	32eace043b	Fix reading number of HTMX metrics. The length of the HHEA table can be incorrect, so it is better to read the number of metrics offset from beginning of table instead.	2019-01-04 15:13:13 -08:00
Tim van der Meij	b39ec7af96	Merge pull request #10408 from Snuffleupagus/issue-10407 Prevent errors, because of incorrect scope, in the `XMLParserBase._resolveEntities` method (issue 10407)	2019-01-04 23:45:26 +01:00
Jonas Jenwald	66fccd860b	Adjust how `AnnotationBorderStyle.setWidth` handles the input being a `Name` (issue 10385) In order to be consistent with the behaviour in Adobe Reader, the width will now always be set to zero when the input is a `Name`.	2019-01-04 10:38:10 +01:00

1 2 3 4 5 ...

1835 Commits