pdf.js

Author	SHA1	Message	Date
Jonas Jenwald	327f2eb588	Ensure that `onProgress` is always called when the entire PDF file has been loaded, regardless of how it was fetched (issue 10160) Please note: I'm totally fine with this patch being rejected, and the issue closed as WONTFIX; however these changes should address the issue if that's desired. From a conceptual point of view, reporting loading progress doesn't really make a lot of sense for PDF files opened by passing raw binary data directly to `getDocument` (since obviously all data was loaded). This is compared to PDF files loaded via e.g. `XMLHttpRequest` or the Fetch API, where the entire PDF file isn't available from the start and knowing the loading progress makes total sense. However I can certainly see why the current API could be considered inconsistent, which isn't great, since a registered `onProgress` callback will never be called for certain `getDocument` calls. The simplest solution to this inconsistency thus seem to be to ensure that `onProgress` is always called when handling the `DataLoaded` message, since that will always be dispatched[1] from the worker-thread. --- [1] Note that this isn't guaranteed to happen, since setting `disableAutoFetch = true` often prevents the entire file from ever loading. However, this isn't relevant for the issue at hand, and is a well-known consequence of using `disableAutoFetch = true`; note how the default viewer even has a specialized code-path for hiding the loadingBar.	2018-10-16 13:51:12 +02:00
Kevin Lee Drum	4cf10ac79d	set returnValues.suggestedLength to Content-Length if integer	2018-10-07 13:26:29 -04:00
Tim van der Meij	ff2df9c5b6	Merge pull request #10117 from leblanc-simon/ink-annotation-support Add support of Ink annotation	2018-10-04 23:39:41 +02:00
Jonas Jenwald	2ed3591b22	Make `PDFFindController` less confusing to use, by allowing searching to start when `setDocument` is called This patch is based on something that I noticed while working on PR 10126. The recent re-factoring of `PDFFindController` brought many improvements, among those the fact that access to `BaseViewer` is no longer required. However, with these changes there's one thing which now strikes me as not particularly user-friendly[1]: The fact that in order for searching to actually work, `PDFFindController.setDocument` must be called and a 'pagesinit' event must be dispatched (from somewhere). For all other viewer components, calling the `setDocument` method[2] is enough in order for the component to actually be usable. The `PDFFindController` thus stands out quite a bit, and it also becomes difficult to work with in any sort of custom implementation. For example: Imagine someone trying to use `PDFFindController` separately from the viewer[3], which should now be relatively simple given the re-factoring, and thus having to (somehow) figure out that they'll also need to manually dispatch a 'pagesinit' event for searching to work. Note that the above even affects the unit-tests, where an out-of-place 'pagesinit' event is being used. To attempt to address these problems, I'm thus suggesting that only `setDocument` should be used to indicate that searching may start. For the default viewer and/or the viewer components, `BaseViewer.setDocument` will now call `PDFFindController.setDocument` when the document is ready, thus requiring no outside configuration anymore[4]. For custom implementation, and the unit-tests, it's now as simple as just calling `PDFFindController.setDocument` to allow searching to start. --- [1] I should have caught this during review of PR 10099, but unfortunately it's sometimes not until you actually work with the code in question that things like these become clear. [2] Assuming, obviously, that the viewer component in question actually implements such a method :-) [3] There's even a very recent issue, filed by someone trying to do just that. [4] Short of providing a `PDFFindController` instance when creating a `BaseViewer` instance, of course.	2018-10-04 10:28:50 +02:00
Simon Leblanc	b5806735d8	Add support of Ink annotation	2018-10-03 00:28:49 +02:00
Tim van der Meij	1b402996cf	Implement a basic unit test for the find controller This commit shows that we can now unit test the find controller and that executing regular queries works. Note that this is only a first step and not a complete suite of unit tests for all possible options of the find controller. While writing this unit test, I found two smaller issues that I addressed directly. The first one is that in the previous find controller refactoring I forgot to rename some occurrences of a now private member variable. Fortunately this did not cause any bugs since we did have a public getter and the fetched value may be changed by reference, but it's nevertheless good to fix. The second issue is that some entries in the `test/unit/clitests.json` file were not correct, resulting in these tests not being executed on e.g., Travis CI.	2018-09-30 18:32:34 +02:00
Jonas Jenwald	1c814e208e	Prevent `getPDFFileNameFromURL` from breaking if the `url` parameter is not a string	2018-09-30 12:28:59 +02:00
Jonas Jenwald	842e9206c0	Replace `String.prototype.substr()` occurrences with `String.prototype.substring()` As outlined in https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/substr, which refers to the ECMA-262 specification, using the `substr` function is advised against. Hence this PR, which replaces all remaining `substr` occurrences with `substring` instead. Please refer to https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/substr#Syntax respectively https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/substring#Syntax for the differences between the two functions. Note that in most cases in the code-base there's only one argument passed to `substr`, and those require no other changes except replacing "substr" with "substring". For the other cases, the `substr(start, length)` calls are changed to `substring(start, start + length)` instead.	2018-09-28 11:41:07 +02:00
Jonas Jenwald	39776168a2	Add `EventBus` unit-tests to ensure that the (optional) argument handling works correctly	2018-09-21 14:31:35 +02:00
Jonas Jenwald	f317a2cb40	Ensure that the DOM event listeners are removed at the end of the relevant `EventBus` unit-tests, to prevent the tests from interfering with each other	2018-09-20 23:12:01 +02:00
Tim van der Meij	99de25d6cc	Implement unit tests for the `isSameOrigin` and `createValidAbsoluteUrl` utility functions Moreover, mark the `isValidProtocol` function as private since it's only used in the utilities file and is not (meant to be) exported.	2018-09-11 16:17:45 +02:00
Jonas Jenwald	6d804d657f	Add initial support for "Whole words" searching in the viewer As outlined in https://bugzilla.mozilla.org/show_bug.cgi?id=1282759 the internal Firefox name for the feature is `entireWord`, hence that name is used here as well for consistency (with "Whole words" being limited to the UI). Given existing limitations of the PDF.js search functionality, e.g. the existing problems of searching across "new lines", there's some edge-cases where "Whole words" searching will ignore (valid) results. However, considering that this is a pre-existing issue related to the way that the find controller joins text-content together, that shouldn't have to block this new feature in my opionion. Please note: In order to enable this feature in the `MOZCENTRAL` version, a small follow-up patch for [PdfjsChromeUtils.jsm](https://hg.mozilla.org/mozilla-central/file/tip/browser/extensions/pdfjs/content/PdfjsChromeUtils.jsm) will be required once this has landed in `mozilla-central`.	2018-09-10 11:59:29 +02:00
Tim van der Meij	66422eb83e	Merge pull request #9340 from brendandahl/private-use Map all glyphs to the private use area and duplicate the first glyph.	2018-09-08 17:51:04 +02:00
Brendan Dahl	b76cf665ec	Map all glyphs to the private use area and duplicate the first glyph. There have been lots of problems with trying to map glyphs to their unicode values. It's more reliable to just use the private use areas so the browser's font renderer doesn't mess with the glyphs. Using the private use area for all glyphs did highlight other issues that this patch also had to fix: * small private use area - Previously, only the BMP private use area was used which can't map many glyphs. Now, the (much bigger) PUP 16 area can also be used. * glyph zero not shown - Browsers will not use the glyph from a font if it is glyph id = 0. This issue was less prevalent when we mapped to unicode values since the fallback font would be used. However, when using the private use area, the glyph would not be drawn at all. This is illustrated in one of the current test cases (issue #8234) where there's an "ä" glyph at position zero. The PDF looked like it rendered correctly, but it was actually not using the glyph from the font. To properly show the first glyph it is always duplicated and appended to the glyphs and the maps are adjusted. * supplementary characters - The private use area PUP 16 is 4 bytes, so String.fromCodePoint must be used where we previously used String.fromCharCode. This is actually an issue that should have been fixed regardless of this patch. * charset - Freetype fails to load fonts when the charset size doesn't match number of glyphs in the font. We now write out a fake charset with the correct length. This also brought up the issue that glyphs with seac/endchar should only ever write a standard charset, but we now write a custom one. To get around this the seac analysis is permanently enabled so those glyphs are instead always drawn as two glyphs.	2018-09-05 14:04:54 -07:00
Jonas Jenwald	e5a6d892b4	Revert "Attempt to combine separate beginText/endText sequences in `getTextContent` (issue 9984)"	2018-09-05 18:01:33 +02:00
Tim van der Meij	e812c6e7ac	Use shorter code for failing a test in `test/unit/api_spec.js`	2018-09-02 21:23:09 +02:00
Tim van der Meij	959ed3705b	Implement a permissions API	2018-09-02 21:23:09 +02:00
Tim van der Meij	a096e0c647	Merge pull request #10032 from timvandermeij/test-link Replace broken link for `pr8808.pdf.link`	2018-09-02 14:52:21 +02:00
Tim van der Meij	b62f14f3f5	Replace broken link for `pr8808.pdf.link` The current link had an invalid certificate and was a redirect to this new link anyway. The MD5 hash is equal.	2018-09-02 14:48:26 +02:00
Tim van der Meij	c94df0fef3	Merge pull request #9986 from Snuffleupagus/issue-9984 Attempt to combine separate beginText/endText sequences in `getTextContent` (issue 9984)	2018-09-01 21:21:29 +02:00
Tim van der Meij	f2f2e05bb8	Merge pull request #10019 from Snuffleupagus/eventBusDispatchToDOM Add general support for re-dispatching events, on `EventBus` instances, to the DOM	2018-09-01 19:11:23 +02:00
Jonas Jenwald	0b1f41c5b3	Add general support for re-dispatching events, on `EventBus` instances, to the DOM This patch is the first step to be able to eventually get rid of the `attachDOMEventsToEventBus` function, by allowing `EventBus` instances to simply re-dispatch most[1] events to the DOM. Note that the re-dispatching is purposely implemented to occur after all registered `EventBus` listeners have been serviced, to prevent the ordering issues that necessitated the duplicated page/scale-change events. The DOM events are currently necessary for the `mozilla-central` tests, see https://hg.mozilla.org/mozilla-central/file/tip/browser/extensions/pdfjs/test, and perhaps also for custom deployments of the PDF.js default viewer. Once this have landed, and been successfully uplifted to `mozilla-central`, I intent to submit a patch to update the test-code to utilize the new preference. This will thus, eventually, make it possible to remove the `attachDOMEventsToEventBus` functionality. Please note: I've successfully ran all `mozilla-central` tests locally, with these patches applied. --- [1] The exception being events that originated on the `window` or `document`, since those are already globally available anyway.	2018-08-30 17:28:12 +02:00
Jonas Jenwald	95e5bad4c4	Attempt to find truncated endstream commands, in the fallback code-path, in `Parser.makeStream` (issue 10004) Apparently there's some PDF generators, in this case the culprit is "Nooog Pdf Library / Nooog PStoPDF v1.5", that manage to mess up PDF creation enough that endstream[1] commands actually become truncated. Please note: The solution implemented here isn't perfect, since it won't be able to cope with PDF files that contains a mixture of correct and truncated endstream commands. However, considering that this particular mode of corruption fortunately doesn't seem very common[2], a slightly less complex solution ought to suffice for now. Fixes 10004. --- [1] Scanning through the PDF data to find endstream commands becomes necessary, in order to determine the stream length in cases where the `Length` entry of the (stream) dictionary is missing/incorrect. [2] I cannot recall having seen any (previous) issues/bugs with "Missing endstream" errors.	2018-08-26 11:51:11 +02:00
Jonas Jenwald	497b765ede	Attempt to combine separate beginText/endText sequences in `getTextContent` (issue 9984) Please note that while this improves issue 9984 slightly (and likely others too), it's not a complete solution. The remaining issues are related to the, more general, problems with the existing heuristics related to attempting to combine separate text items.	2018-08-18 13:45:32 +02:00
Tim van der Meij	1268aea2b6	Merge pull request #9975 from Snuffleupagus/getDestination-refactor Re-factor `destinations`/`getDestination` to reduce unnecessary duplication, and reject non-string inputs	2018-08-12 15:51:58 +02:00
Tim van der Meij	af19ed6ee9	Merge pull request #9822 from timvandermeij/annotations [api-minor] Refactor the annotation code to be asynchronous	2018-08-11 20:39:50 +02:00
Tim van der Meij	bbc769cf81	Convert `test/unit/annotation_spec.js` to ES6 syntax	2018-08-11 19:00:29 +02:00
dmitryskey	3741becb9b	[api-minor] Refactor the annotation code to be asynchronous This commit is the first step towards implementing parsing for the appearance streams of annotations. Co-authored-by: Jonas Jenwald <jonas.jenwald@gmail.com> Co-authored-by: Tim van der Meij <timvandermeij@gmail.com>	2018-08-11 19:00:29 +02:00
Jonas Jenwald	1179584fd6	Reject `getDestination`, in the API, for non-string inputs Note how e.g. the `getPage` method does basic validation of the input.	2018-08-11 16:06:35 +02:00
Jonas Jenwald	f78efd883e	Attempt to throw `MissingPDFException` when applicable in `node_stream.js` (issue 9791)	2018-08-06 10:00:03 +02:00
Tim van der Meij	f6eaa99cb2	Reword test reporter message The font tests use Jasmine too, so while they are technically unit tests, it's a bit confusing to see `Started unit tests` when the font tests are run on the bots.	2018-08-05 21:21:46 +02:00
Tim van der Meij	4111871ac5	Merge pull request #9958 from brendandahl/always-fallback Always fallback to system font on font failure.	2018-08-05 19:58:48 +02:00
Tim van der Meij	27e8a2f6fe	Merge pull request #9959 from brendandahl/test-util Utility script to add a reference test.	2018-08-05 16:53:37 +02:00
Tim van der Meij	b65d0450f5	Merge pull request #9960 from brendandahl/strict-verify Fail when MD5 of test files fails on bots.	2018-08-05 16:44:12 +02:00
Brendan Dahl	482ea2af32	Fail when MD5 of test files fails on bots.	2018-08-03 17:48:47 -07:00
Brendan Dahl	8b3ed473c1	Utility script to add a reference test.	2018-08-03 17:24:24 -07:00
Brendan Dahl	5f67a6a237	Always fallback to system font on font failure. The font in the PDF is marked as a CIDFontType0, but the font file is actually a true type font. To fully address this issue we should really peek into the font file and try to determine what it is. However, this is the first case of this issue, so I think this solution is acceptable for now.	2018-08-03 16:49:22 -07:00
Tim van der Meij	444976bcd5	Merge pull request #9956 from brendandahl/allow-zero-progress Allow loaded progress of 0 in unit tests.	2018-08-04 00:19:02 +02:00
Tim van der Meij	f19ee127a3	Merge pull request #9874 from boundlesshq/master [api-minor] Include export value for checkboxes	2018-08-03 23:43:23 +02:00
Brendan Dahl	d762567bcf	Allow loaded progress of 0 in unit tests.	2018-08-03 10:31:46 -07:00
Tim van der Meij	8a4be24645	Merge pull request #9948 from Snuffleupagus/url-polyfill-unit-tests Add (basic) unit-tests for the non-global `URL` constructor (PR 9868 follow-up)	2018-08-02 23:32:07 +02:00
Brian	2a665ebad4	Removed Extraneous Matrix Check in CalRGB Conversion	2018-08-02 10:16:42 -07:00
Jonas Jenwald	f8388710e6	Add (basic) unit-tests for the non-global `URL` constructor (PR 9868 follow-up) This should really have been included in PR 9868, since it will help ensure that the `URL` constructor is correctly imported/exported by `src/shared/util.js`.	2018-08-02 10:32:06 +02:00
Tim van der Meij	716acf63d4	Merge pull request #9938 from Snuffleupagus/issue-9915 Ensure that Type0, i.e. composite, OpenType fonts with `CFF ` tables are not treated as CFF fonts if their glyph mapping is non-default (issue 9915)	2018-08-02 00:11:18 +02:00
Jonas Jenwald	3ce420131f	Prefer the Width/Height of the image data, rather than the image dictionary, for JPEG 2000 images (issue 9650) According to the PDF specification, see https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#page=45 > When using the JPXDecode filter with image XObjects, the following changes to and constraints on some entries in the image dictionary shall apply (see 8.9.5, "Image Dictionaries" for details on these entries): > > - Width and Height shall match the corresponding width and height values in the JPEG2000 data. > > - . . . Hence it seems reasonable to use the Width/Height of the image data itself, rather than the image dictionary when there's a mismatch. Given that JPEG 2000 images are already being parsed, in order to obtain basic parameters, the actual Width/Height is readily available in the `PDFImage` constructor.	2018-08-01 16:42:26 +02:00
Jonas Jenwald	690bcc8c8a	Add a reduced, `eq`, test-case for issue 9915	2018-07-29 23:06:15 +02:00
bion	c31ddf7edc	[api-minor] Include export value for checkboxes	2018-07-28 00:30:41 -07:00
Jonas Jenwald	928b89382e	[api-minor] Add an `IsLinearized` property to the `PDFDocument.documentInfo` getter, to allow accessing the linearization status through the API (via `PDFDocumentProxy.getMetadata`) There was a (somewhat) recent question on IRC about accessing the linearization status of a PDF document, and this patch contains a simple way to expose that through already existing API methods. Please note that during setup/parsing in `PDFDocument` the linearization data is already being fetched and parsed, provided of course that it exists. Hence this patch will not cause any additional data to be loaded.	2018-07-26 15:54:19 +02:00
Jonas Jenwald	36b683ca55	Provide custom messages for the `no-restricted-globals` ESLint rule, and refactor the `.eslintrc` files (PR 9868 follow-up) Without providing useful (custom) error messages for the `no-restricted-globals` rule, see https://eslint.org/docs/rules/no-restricted-globals, it's quite likely that the rule will be incorrectly disabled rather than the required globals being imported as intended. To reduced duplication of the `no-restricted-globals` rule in multiple `.eslintrc` files, it's instead moved to the top-level `.eslintrc` file and disabled as needed on a folder/file basis outside of `/src` and `/web`.	2018-07-23 14:10:13 +02:00
Jonas Jenwald	8ec99b200c	Prevent Metadata/XML parsing from breaking `PDFDocumentProxy.getMetadata` when no XML root document is found (issue 8884) With the new XML parser, see PR 9573, the referenced PDF file now causes `getMetadata` to fail when incomplete XML tags are encountered. This provides a simple, and hopefully generally useful, work-around that may also help prevent future bugs. (Without being able to reproduce nor even understand the other (non XML) errors mentioned in issue 8884, I'd say that this patch is enough to close that one as fixed.)	2018-07-18 11:37:40 +02:00

... 6 7 8 9 10 ...

2119 Commits