pdf.js

Author	SHA1	Message	Date
Jonas Jenwald	363e517acf	Remove the `HTMLElement.dataset` polyfill This is only relevant for browsers that we don't intend to support with PDF.js version `2.0`.	2017-12-19 14:50:18 +01:00
Jonas Jenwald	4880200cd4	Remove the `XMLHttpRequest.response` polyfill This is only relevant for browsers that we don't intend to support with PDF.js version `2.0`.	2017-12-19 14:48:43 +01:00
Jonas Jenwald	8266cc18e7	Remove the `webkitURL` polyfill This is only relevant for browsers that we don't intend to support with PDF.js version `2.0`.	2017-12-19 14:46:04 +01:00
Soumya Himanish Mohapatra	95ad956f68	PDFjs now compatible with Librejs	2017-12-19 15:13:50 +05:30
Jonas Jenwald	1dc54ddb40	Handle PDF files with missing 'endobj' operators, by searching for the "obj" string rather than "endobj" in `XRef.indexObjects` (issue 9105) This patch refactors the searching for 'endobj', to try and find the next occurance of "obj" and then check if it was in fact an 'endobj' and continue searching otherwise. This approach is used to avoid having to first find 'endobj', and then re-check the entire contents of the object and having to run (potentially expensive) regular expressions on arbitrary long strings. Fixes 9105.	2017-12-18 13:17:45 +01:00
Tim van der Meij	6bbe91079b	Merge pull request #9272 from nveenjain/fix/8846 Replaced occurence of `throw new Error` with `unreachable`	2017-12-15 22:11:32 +01:00
Jonas Jenwald	6515b91118	Merge pull request #9276 from mozilla/loca-fix Fix loca table when offsets aren't in ascending order.	2017-12-15 20:59:42 +01:00
Brendan Dahl	9b51cea724	Fix loca table when offsets aren't in ascending order.	2017-12-15 11:20:28 -06:00
Naveen Jain	1135674647	Replaced occurence of `throw new Error` with `unreachable` where applicable	2017-12-14 12:58:50 +05:30
Jonas Jenwald	ad5ed37059	Handle broken, Ghostscript generated, Metadata that contains HTML character names (bug 1424938) Please note that while this could be considered a regression in user-facing behaviour, I'm not convinced that it's really a regression as such since prior to PR 8912 the Metadata would fail to parse (with an XML error) and thus be ignored when setting the viewer title. With the refactored Metadata parsing we're now able to parse this, which uncovered issues with a subset of broken Ghostscript Metadata that uses HTML character names. Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1424938	2017-12-13 14:32:47 +01:00
Tim van der Meij	095c63cc25	Merge pull request #9260 from Snuffleupagus/rm-JpegStream.getBytes Attempt to remove the special `JpegStream.getBytes` method and utilize the regular `DecodeStream` one instead	2017-12-10 16:50:50 +01:00
Tim van der Meij	c35bbd11b0	Use native `Math` functions in the custom `log2` function It is quite confusing that the custom function is called `log2` while it actually returns the ceiling value and handles zero and negative values differently than the native function. To resolve this, we add a comment that explains these differences and make the function use the native `Math` functions internally instead of using our own custom logic. To verify that the function does what we expect, we add unit tests. All browsers except for IE support `Math.log2` for quite a long time already (see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Math/log2). For IE, we use the core-js polyfill. According to the microbenchmark at https://jsperf.com/log2-pdfjs/1, using the native functions should also be faster, in my testing almost six times as fast.	2017-12-10 16:35:17 +01:00
Jonas Jenwald	84de1e9a92	Attempt to remove the special `JpegStream.getBytes` method and utilize the regular `DecodeStream` one instead Note that no other image stream implements a special `getBytes` method, which makes `JpegStream` look somewhat odd. I'm actually not sure what purpose this methods serves, since I successfully ran all tests locally with it commented out. Furhermore, I also ran tests with an added `if (length && length !== this.bufferLength) { throw new Error('length mismatch'); }` check, and didn't get a single test failure in that case either. Looking at the history, it seems that this code originated back in PR 4528, but as far as I can tell there's no mention in either commit messages nor PR comments of why it was necessary to add a "special" `getBytes` function for the `JpegStream`. My assumption is that there's a good reason why this method was added, e.g. to address a specific regression in one of the reference tests. However, I did check out commit `58f697f977` locally and ran tests with this method commented out, and there didn't seem to be any image-related failures in that case either!? Hence I'm suggesting that we attempt to simplify this code slightly be removing this special `getBytes` method. However, please note that there's perhaps a small risk of regressions in an edge-case where we currently have insufficient test-coverage.	2017-12-10 13:31:08 +01:00
Brendan Dahl	af1d80d45e	Merge pull request #9230 from Snuffleupagus/issue-9195 Add basic support for non-embedded Calibri fonts (issue 9195)	2017-12-08 10:15:43 -08:00
Jonas Jenwald	a5e3261b48	Merge pull request #9062 from mozilla/no_high Move char codes from high surrogate pair range into private use.	2017-12-08 12:31:22 +01:00
Brendan Dahl	306999c325	Move char codes from high surrogate pair range into private use. Fixes #2884	2017-12-07 10:35:50 -08:00
Jonas Jenwald	7c5ba9aad5	[api-major] Only create a `StatTimer` for pages when `enableStats == true` (issue 5215) Unless the debugging tools (i.e. `PDFBug`) are enabled, or the `browsertest` is running, the `PDFPageProxy.stats` aren't actually used for anything. Rather than initializing unnecessary `StatTimer` instances, we can simply re-use one dummy class (with static methods) for every page. Note that by using a dummy `StatTimer` in this way, rather than letting `PDFPageProxy.stats` be undefined, we don't need to guard every single stats collection callsite. Since it wouldn't make much sense to attempt to use `PDFPageProxy.stats` when stat collection is disabled, it was instead changed to a "private" property (i.e. `PDFPageProxy._stats`) and a getter was added for accessing `PDFPageProxy.stats`. This getter will now return `null` when stat collection is disabled, making that case easy to handle. For benchmarking purposes, the test-suite used to re-create the `StatTimer` after loading/rendering each page. However, modifying properties on various API code from the outside in this way seems very error-prone, and is an anti-pattern that we really should avoid at all cost. Hence the `PDFPageProxy.cleanup` method was modified to accept an optional parameter, which will take care of resetting `this.stats` when necessary, and `test/driver.js` was updated accordingly. Finally, a tiny bit more validation was added on the viewer side, to ensure that all the code we're attempting to access is defined when handling `PDFPageProxy` stats.	2017-12-06 23:12:25 +01:00
Jonas Jenwald	50b72dec6e	Convert `StatTimer` to an ES6 class	2017-12-06 13:59:03 +01:00
Jonas Jenwald	6b1eda3e12	Move `StatTimer` from `src/shared/util.js` to `src/display/dom_utils.js` Since the `StatTimer` is not used in the worker, duplicating this code on both the main and worker sides seem completely unnecessary.	2017-12-06 13:51:04 +01:00
Jonas Jenwald	08de655177	Add basic support for non-embedded Calibri fonts (issue 9195) There's a number of issues with the fonts in the referenced PDF file. First of all, they contain broken `ToUnicode` data (`NUL` bytes all over the place). However even if you skip those, the `ToUnicode` data appears to contain nothing but a `IdentityH` CMap which won't help provide a proper glyph mapping. The real issue actually turns out to be that the PDF file uses the "Calibri" font[1], but doesn't include any font files. Since that one isn't a standard font, and uses a fairly different CID to GID map compared to the standard fonts, we're not able to render the file even remotely correct. To work around this, I'm thus proposing that we include a (incomplete) glyph map for Calibri, and fallback to the standard Helvetica font. Obviously this isn't going to look perfect, but it's really the best that we can hope to achieve given that the PDF file is missing the necessary font data. Finally, please note that none of the PDF readers I've tried (Adobe Reader, PDFium in Chrome) were able to extract the text (which isn't very surprising, given the broken `ToUnicode` data). Fixes 9195. --- [1] According to Wikipedia, see https://en.wikipedia.org/wiki/Calibri, Calibri is (primarily) a Windows font.	2017-12-03 17:23:33 +01:00
Jonas Jenwald	f3c50fe2f9	Merge pull request #9192 from Snuffleupagus/issue-8229 Build a fallback `ToUnicode` map for simple fonts (issue 8229)	2017-11-30 10:27:32 +01:00
Tim van der Meij	e320243870	Merge pull request #9206 from janpe2/svg-inv-images Fix inverted 1-bit images in SVG backend	2017-11-28 22:46:43 +01:00
Jani Pehkonen	58b214eab3	Fix inverted 1-bit images in SVG backend	2017-11-28 21:24:27 +02:00
Jani Pehkonen	06d083b04b	Fix pattern-filled text	2017-11-28 19:40:22 +02:00
Tim van der Meij	3e34eb31d9	Merge pull request #9191 from timvandermeij/pushbuttons Button widget annotations: implement support for pushbuttons	2017-11-27 22:31:07 +01:00
Jonas Jenwald	61e19bee43	Build a fallback `ToUnicode` map for simple fonts (issue 8229) In some fonts, the included `ToUnicode` data is incomplete causing text-selection to not work properly. For simple fonts that contain encoding data, we can manually build a `ToUnicode` map to attempt to improve things. Please note that since we're currently using the `ToUnicode` data during glyph mapping, in an attempt to avoid rendering regressions, I purposely didn't want to amend to original `ToUnicode` data for this text-selection edge-case. Instead, I opted for the current solution, which will (hopefully) give slightly better text-extraction results in PDF file with incomplete `ToUnicode` data. According to the PDF specification, see [section 9.10.2](http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G8.1873172): > A conforming reader can use these methods, in the priority given, to map a character code to a Unicode value. > ... Reading that paragraph literally, it doesn't seem too unreasonable to use different methods for different charcodes. Fixes 8229.	2017-11-26 14:45:15 +01:00
Tim van der Meij	0fe80df2a7	Button widget annotations: implement support for pushbuttons	2017-11-26 14:09:48 +01:00
Jonas Jenwald	ffbfc3c2a7	Refactor the building of `ToUnicode` maps for simple fonts a helper method	2017-11-26 13:30:29 +01:00
Jonas Jenwald	ab1f76cc37	Remove the unused `capability` parameter from the `WorkerTransport.getPage` method That parameter, originally named `promise`, has been unused for over five years; ever since commit `f0687c4d50` in PR 1531.	2017-11-25 11:49:33 +01:00
Jonas Jenwald	59b5e14301	Split the existing `WebGLUtils` in two classes, a private `WebGLUtils` and a public `WebGLContext`, and utilize the latter in the API to allow various code to access the methods of `WebGLUtils` This patch is one (small) step on the way to reduce the general dependency on a global `PDFJS` object, for PDF.js version `2.0`.	2017-11-24 21:54:47 +01:00
Jonas Jenwald	cc47ef56ec	Remove the `onclick` polyfill for old versions of Opera This was only relevant for no obsolete versions Opera, that use the Presto engine. According to https://en.wikipedia.org/wiki/History_of_the_Opera_web_browser#Opera_2013, the last version affected was released in 2013.	2017-11-21 11:02:14 +01:00
Jonas Jenwald	d18b2a8e73	Remove the `classList` polyfill This is only relevant for browsers that we don't intend to support with PDF.js version `2.0`.	2017-11-21 11:01:52 +01:00
Jonas Jenwald	4b15e8566b	Remove the `Function.prototype.bind` polyfill This is only relevant for browsers that we don't intend to support with PDF.js version `2.0`.	2017-11-21 11:00:55 +01:00
Jonas Jenwald	d8cb74d3e3	Remove the `btoa`/`atob` polyfills This is only relevant for browsers that we don't intend to support with PDF.js version `2.0`.	2017-11-21 11:00:55 +01:00
Jonas Jenwald	150ac0788f	Remove IE9 specific `XMLHttpRequest` polyfills that utilize `VBArray`	2017-11-21 11:00:55 +01:00
Jonas Jenwald	935c5c587f	Remove the `Object.defineProperty` polyfill This is only relevant for browsers that we don't intend to support with PDF.js version `2.0`.	2017-11-21 11:00:55 +01:00
Tim van der Meij	25b07812b9	Sanitize the display value for choice widget annotations	2017-11-18 20:37:27 +01:00
Tim van der Meij	9e8cf448b0	Merge pull request #9140 from Snuffleupagus/rm-console-polyfill Remove the `console` polyfills	2017-11-18 15:49:19 +01:00
Tim van der Meij	edaf4b3173	Merge pull request #9037 from Snuffleupagus/refactor-streams-params Re-factor how parameters are passed to the network streams	2017-11-18 15:41:15 +01:00
Tim van der Meij	ae07adf143	Merge pull request #9073 from Snuffleupagus/image-streams-fixes Fix the interface of `JpegStream`/`JpxStream`/`Jbig2Stream` to agree with the other `DecodeStream`s	2017-11-17 23:26:36 +01:00
Tim van der Meij	1d67d9dccd	Merge pull request #9131 from janpe2/svg-empty-paths Filling and stroking empty paths in SVG backend	2017-11-16 22:43:24 +01:00
Jonas Jenwald	42099c564f	Remove the `console` polyfills All browsers that we intend to support with PDF.js version 2.0 already supports `console` natively.	2017-11-16 09:34:51 +01:00
Jonas Jenwald	d5174cd826	Remove the `requestAnimationFrame` polyfill According to https://developer.mozilla.org/en-US/docs/Web/API/window/requestAnimationFrame#Browser_compatibility and https://caniuse.com/#feat=requestanimationframe, the browsers we intend to support with PDF.js version 2.0 should all have native `requestAnimationFrame` support. Note that the reason for indiscriminately polyfilling `requestAnimationFrame` in iOS, see PR 4961, was apparently because of a bug in iOS 6. However, according to [Wikipedia](https://en.wikipedia.org/wiki/IOS_version_history#iOS_8): "Support for iOS 8 ended in 2017.", hence the lowest version currently supported is iOS 9.	2017-11-15 16:08:48 +01:00
Jani Pehkonen	4e8f7070da	Filling and stroking empty paths in SVG backend	2017-11-14 18:35:39 +02:00
Jonas Jenwald	745cb73c65	Remove `PDFJS.disableRange`/`PDFJS.disableStream` code for now unsupported browsers in `src/shared/compatibility.js` We're currently disabling range requests and streaming for a number of configurations. A couple of those will no longer be supported (with PDF.js version 2.0), hence we ought to be able to clean up the compatibility code slightly.	2017-11-14 15:28:50 +01:00
Jonas Jenwald	eb3a1f24a3	Remove the `PDFJS.disableHistory` code from `src/shared/compatibility.js` This compatibility code is only relevant for browsers that will no longer be supported (with PDF.js version 2.0), hence we ought to be able to remove it.	2017-11-14 15:28:50 +01:00
Tim van der Meij	9686f6652c	Merge pull request #9089 from yurydelendik/rm-chunks Extracts OperatorList class and prepares for streaming	2017-11-13 23:35:40 +01:00
Jonas Jenwald	23699cef1c	Re-factor how parameters are passed to the network streams This patch is the result of me starting to look into moving parameters from `PDFJS` into `getDocument` and other API methods. When familiarizing myself with the code, the signatures of the various network streams seemed to be unnecessarily cumbersome since `disableRange` is currently handled separately from other parameters. I'm assuming that the explanation for this is probably "for historical reasons", as is often the case. Hence I'd like to clean this up before we start the larger, and more invasive, `PDFJS` parameter re-factoring.	2017-11-11 11:23:29 +01:00
Jonas Jenwald	de5297b9ea	Fix the interface of `JpegStream`/`JpxStream`/`Jbig2Stream` to agree with the other `DecodeStream`s The interface of all of the "image" streams look kind of weird, and I'm actually a bit surprised that there hasn't been any errors because of it. For example: None of them actually implement `readBlock` methods, and it seems more luck that anything else that we're not calling `getBytes()` (without providing a length) for those streams, since that would trigger a code-path in `getBytes` that assumes `readBlock` to exist. To address this long-standing issue, the `ensureBuffer` methods are thus renamed to `readBlock`. Furthermore, the new `ensureBuffer` methods are now no-ops. Finally, this patch also replaces `var` with `let` in a number of places.	2017-11-11 11:22:16 +01:00
Jonas Jenwald	36593d6bbc	Move `JpegStream` and `JpxStream` to their own files	2017-11-11 11:22:16 +01:00

... 8 9 10 11 12 ...

3542 Commits