pdf.js

Author	SHA1	Message	Date
Jonas Jenwald	bc89edb8f0	Ensure that `Uint8ClampedArray` is used for image data transfered by `getTransfers` (PR 9802 follow-up) One of the `QueueOptimizer` cases wasn't updated to use `Uint8ClampedArray`s, which leads to inconsistent image data on the API side (but no actual rendering bugs, as far as I can tell). To prevent future errors, a non-production/test-only `assert` was added to ensure that the relevant image data only uses `Uint8ClampedArray`s.	2018-08-16 10:29:44 +02:00
Tim van der Meij	1268aea2b6	Merge pull request #9975 from Snuffleupagus/getDestination-refactor Re-factor `destinations`/`getDestination` to reduce unnecessary duplication, and reject non-string inputs	2018-08-12 15:51:58 +02:00
Tim van der Meij	af19ed6ee9	Merge pull request #9822 from timvandermeij/annotations [api-minor] Refactor the annotation code to be asynchronous	2018-08-11 20:39:50 +02:00
dmitryskey	3741becb9b	[api-minor] Refactor the annotation code to be asynchronous This commit is the first step towards implementing parsing for the appearance streams of annotations. Co-authored-by: Jonas Jenwald <jonas.jenwald@gmail.com> Co-authored-by: Tim van der Meij <timvandermeij@gmail.com>	2018-08-11 19:00:29 +02:00
Jonas Jenwald	1179584fd6	Reject `getDestination`, in the API, for non-string inputs Note how e.g. the `getPage` method does basic validation of the input.	2018-08-11 16:06:35 +02:00
Jonas Jenwald	b74c813353	Re-factor `destinations`/`getDestination`, in the `Catalog`, to reduce unnecessary duplication Currently, these two methods contain the same boilerplate code for getting the /Dests data.	2018-08-11 16:04:58 +02:00
Jonas Jenwald	06d1ff5af4	Tweak the MMType1 font detection in `getFontFileType` to improve font telemetry (PR 9961 follow-up) Please note that this patch does not affect rendering in any way, however it's relevant for font telemetry[1]. According to the specification, see https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G8.1904956, Type1C is a valid subtype for both Type1 and MMType1 fonts. --- [1] Refer to the font telemetry results in https://telemetry.mozilla.org/new-pipeline/dist.html#!cumulative=0&end_date=2018-06-25&keys=__none__!__none__!__none__&max_channel_version=nightly%252F62&measure=PDF_VIEWER_FONT_TYPES&min_channel_version=nightly%252F59&processType=*&product=Firefox&sanitize=1&sort_keys=submissions&start_date=2018-05-07&table=0&trim=1&use_submission_date=0 See also https://github.com/mozilla/pdf.js/wiki/Enumeration-Assignments-for-the-Telemetry-Histograms#pdf_viewer_font_types for help with interpreting the data.	2018-08-08 12:18:37 +02:00
Jonas Jenwald	f78efd883e	Attempt to throw `MissingPDFException` when applicable in `node_stream.js` (issue 9791)	2018-08-06 10:00:03 +02:00
Tim van der Meij	4111871ac5	Merge pull request #9958 from brendandahl/always-fallback Always fallback to system font on font failure.	2018-08-05 19:58:48 +02:00
Jonas Jenwald	3177f6aa55	Parse the font file to determine the correct type/subtype, rather than relying on the (often incorrect) data in the font dictionary The current font type/subtype detection code is quite inconsistent/unwieldy. In some cases it will simply assume that the font dictionary is correct, in others it will somewhat "arbitrarily" check the actual font file (more of these cases have been added over the years to fix specific bugs). As is evident from e.g. issue 9949, the font type/subtype detection code is continuing to cause issues. In an attempt to get rid of these hacks once and for all, this patch instead re-factors the type/subtype detection to always parse the font file. Please note that, as far as I can tell, we still appear to need to rely on the composite font detection based on the font dictionary. However, even if the composite/non-composite detection would get it wrong, that shouldn't really matter too much given that there's basically only two different code-paths (for "TrueType-like" vs "Type1-like" fonts).	2018-08-05 11:13:16 +02:00
Jonas Jenwald	9bbca04579	Add a (basic) `isCFFFile` helper function to detect CFF font files Compared to most other font formats, the CFF doesn't have a constant header which makes is slightly more difficult to detect such font files. Please refer to the Compact Font Format specification: https://www.adobe.com/content/dam/acom/en/devnet/font/pdfs/5176.CFF.pdf#G3.32094	2018-08-05 11:13:14 +02:00
Jonas Jenwald	f4db38aadf	Update the TrueType font file detection to also recognize the Mac specific header 'true' Please refer to the TrueType specification: https://developer.apple.com/fonts/TrueType-Reference-Manual/RM06/Chap6.html#ScalerTypeNote	2018-08-05 10:33:56 +02:00
Brendan Dahl	5f67a6a237	Always fallback to system font on font failure. The font in the PDF is marked as a CIDFontType0, but the font file is actually a true type font. To fully address this issue we should really peek into the font file and try to determine what it is. However, this is the first case of this issue, so I think this solution is acceptable for now.	2018-08-03 16:49:22 -07:00
Tim van der Meij	f19ee127a3	Merge pull request #9874 from boundlesshq/master [api-minor] Include export value for checkboxes	2018-08-03 23:43:23 +02:00
Jonas Jenwald	a504befc76	Stop warning for non-Name /Filter entries in the `PDFImage` constructor (PR 9897 follow-up) Fixes a stupid oversight on my part, since /Filter may (obviously) contain an Array, which resulted in unnecessary console warning spam in perfectly valid PDF files. Note that it still makes sense to check that /Filter is actually a Name, before attempting to access its `name` property, but the warning should definitely be removed.	2018-08-03 10:23:08 +02:00
Brian	2a665ebad4	Removed Extraneous Matrix Check in CalRGB Conversion	2018-08-02 10:16:42 -07:00
Tim van der Meij	716acf63d4	Merge pull request #9938 from Snuffleupagus/issue-9915 Ensure that Type0, i.e. composite, OpenType fonts with `CFF ` tables are not treated as CFF fonts if their glyph mapping is non-default (issue 9915)	2018-08-02 00:11:18 +02:00
Jonas Jenwald	3ce420131f	Prefer the Width/Height of the image data, rather than the image dictionary, for JPEG 2000 images (issue 9650) According to the PDF specification, see https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#page=45 > When using the JPXDecode filter with image XObjects, the following changes to and constraints on some entries in the image dictionary shall apply (see 8.9.5, "Image Dictionaries" for details on these entries): > > - Width and Height shall match the corresponding width and height values in the JPEG2000 data. > > - . . . Hence it seems reasonable to use the Width/Height of the image data itself, rather than the image dictionary when there's a mismatch. Given that JPEG 2000 images are already being parsed, in order to obtain basic parameters, the actual Width/Height is readily available in the `PDFImage` constructor.	2018-08-01 16:42:26 +02:00
Jonas Jenwald	17f65908ae	Add more validation of the /Filter entry, in image dictionaries, to the `PDFImage` constructor Given that the code is currently assuming that the /Filter entry is a `Name`, it cannot hurt to actually ensure that's the case. Also fixes an error message, for JPEG 2000 images with unsupported ColorSpaces, since `this.numComps` hasn't been initialized when it's accessed during the `throw new Error()` invocation.	2018-08-01 16:41:15 +02:00
Jonas Jenwald	17eac2d48a	Ensure that Type0, i.e. composite, OpenType fonts with `CFF` tables are not treated as CFF fonts if their glyph mapping is non-default (issue 9915) This particular code-path has been the source of numerous regressions to date, so hopefully this patch won't cause any more of those. Fixes 9915.	2018-07-29 23:06:15 +02:00
Jonas Jenwald	cfdb597e4a	Ensure that the `CIDSystemInfo` strings, in Type0 fonts, are correctly decoded This isn't directly related to the subsequent patch, but just something that I happened to notice while poking around in the font code.	2018-07-29 23:06:15 +02:00
Tim van der Meij	3521424576	Merge pull request #9920 from Snuffleupagus/getMetadata-linearization [api-minor] Add an `IsLinearized` property to the `PDFDocument.documentInfo` getter, to allow accessing the linearization status through the API (via `PDFDocumentProxy.getMetadata`)	2018-07-29 20:23:22 +02:00
Tim van der Meij	f45450bd78	Merge pull request #9931 from Snuffleupagus/refactor-getPage Refactor `getPage` (in the worker), and attempt to use the `Linearization` dictionary to lookup the first Page	2018-07-29 19:33:46 +02:00
Tim van der Meij	a2c317f12b	Merge pull request #9925 from Snuffleupagus/StreamsSequenceStream-maybeLength Attempt to estimate the minimum required `buffer` length when initializing `StreamsSequenceStream` instances	2018-07-29 16:52:34 +02:00
Jonas Jenwald	ec3728b540	Use the `Linearization` dictionary, if it exists, when fetching the first Page Since PDF.js already supports range requests and streaming, not to mention chunked rendering, attempting to use the `Linearization` dictionary in `PDFDocument.getPage` probably isn't going to improve performance in any noticeable way. Nonetheless, when `Linearization` data is available, it will allow looking up the first Page directly without having to descend into the `Pages` tree to find the correct object.	2018-07-28 22:23:36 +02:00
Jonas Jenwald	fbb25ff4e2	Move `getPage`, on the worker side, from `Catalog` and into `PDFDocument` instead Addresses an existing TODO, and avoids having to pass in a `pageFactory` when creating `Catalog` instances.	2018-07-28 22:23:36 +02:00
Jonas Jenwald	81b471c781	[Regression] Convert `Catalog.builtInCMapCache` into a `Map`, instead of an Object, to ensure that it's correctly reset (PR 8064 follow-up) With the `builtInCMapCache` being a simple Object, it unfortunately means that the `Catalog.cleanup` method isn't resetting it as intended. By just replacing the `builtInCMapCache` with an empty Object, existing references to it will not actually be updated. The result is that e.g. `Page` instances still keeps references to, what should have been removed, CMap data. To fix these problems, the `builtInCMapCache` is converted into a `Map` instead (since it can be easily reset).	2018-07-28 22:20:43 +02:00
bion	c31ddf7edc	[api-minor] Include export value for checkboxes	2018-07-28 00:30:41 -07:00
Jonas Jenwald	928b89382e	[api-minor] Add an `IsLinearized` property to the `PDFDocument.documentInfo` getter, to allow accessing the linearization status through the API (via `PDFDocumentProxy.getMetadata`) There was a (somewhat) recent question on IRC about accessing the linearization status of a PDF document, and this patch contains a simple way to expose that through already existing API methods. Please note that during setup/parsing in `PDFDocument` the linearization data is already being fetched and parsed, provided of course that it exists. Hence this patch will not cause any additional data to be loaded.	2018-07-26 15:54:19 +02:00
Jonas Jenwald	8a4466139b	Simplify the `DocumentInfoValidators` definition With this file now being a proper (ES6) module, it's no longer (technically) necessary for this structure to be lazily initialized. Considering its size, and simplicity, I therefore cannot see the harm in letting `DocumentInfoValidators` just be simple Object instead. While I'm not aware of any bugs caused by the current code, it cannot hurt to add an `isDict` check in `PDFDocument.documentInfo` (since the current code assumes that `infoDict` being defined implies it also being a Dictionary). Finally, the patch also converts a couple of `var` to `let`/`const`.	2018-07-26 15:54:01 +02:00
Jonas Jenwald	2d51bce941	Remove unnecessary `stream.length` check from `PDFDocument.linearization` Note first of all that `PDFDocument` will be initialized with either a `Stream` or a `ChunkedStream`, and that both of these have `length` getters. Secondly, the `PDFDocument` constructor will assert that the `stream` has a non-zero (and positive) length. Hence there's no point in checking `stream.length` in the `linearization` getter.	2018-07-26 15:54:01 +02:00
Jonas Jenwald	32bfa55d98	Attempt to estimate the minimum required `buffer` length when initializing `StreamsSequenceStream` instances For most other `DecodeStream` based streams, we'll attempt to estimate the minimum `buffer` length based on the raw stream data. The purpose of this is to avoid having to unnecessarily re-size the `buffer`, thus reducing the number of intermediate allocations necessary when decoding the stream data. However, currently no such optimization is attempted for `StreamsSequenceStream`, and given that they can often be quite large that seems unfortunate. To improve this, at least somewhat, this patch utilizes the raw sizes of the `StreamsSequenceStream` sub-streams to estimate the minimum required `buffer` length. Most likely this patch won't have a huge effect on memory consumption, however for pathological cases it should help reduce peak memory usage slightly. One example is the PDF file in issue 2813, where currently the `StreamsSequenceStream` instances would grow their `buffer`s as `2 MiB -> 4 MiB -> 8 MiB -> 16 MiB -> 32 MiB`. With this patch, the same stream `buffers`s grow as `8 MiB -> 16 MiB -> 32 MiB`, thus avoiding a total of `12 MiB` of intermediate allocations (since there's two `StreamsSequenceStream` used, for rendering/text-extraction).	2018-07-26 13:42:59 +02:00
Jonas Jenwald	36b683ca55	Provide custom messages for the `no-restricted-globals` ESLint rule, and refactor the `.eslintrc` files (PR 9868 follow-up) Without providing useful (custom) error messages for the `no-restricted-globals` rule, see https://eslint.org/docs/rules/no-restricted-globals, it's quite likely that the rule will be incorrectly disabled rather than the required globals being imported as intended. To reduced duplication of the `no-restricted-globals` rule in multiple `.eslintrc` files, it's instead moved to the top-level `.eslintrc` file and disabled as needed on a folder/file basis outside of `/src` and `/web`.	2018-07-23 14:10:13 +02:00
Jonas Jenwald	8ec99b200c	Prevent Metadata/XML parsing from breaking `PDFDocumentProxy.getMetadata` when no XML root document is found (issue 8884) With the new XML parser, see PR 9573, the referenced PDF file now causes `getMetadata` to fail when incomplete XML tags are encountered. This provides a simple, and hopefully generally useful, work-around that may also help prevent future bugs. (Without being able to reproduce nor even understand the other (non XML) errors mentioned in issue 8884, I'd say that this patch is enough to close that one as fixed.)	2018-07-18 11:37:40 +02:00
Tim van der Meij	61db85ab64	Merge pull request #9886 from Snuffleupagus/bug-1473809 Prevent errors in `sanitizeTTProgram`, during parsing of CALL functions, when encountering invalid functions stack deltas (bug 1473809)	2018-07-15 17:23:52 +02:00
Jonas Jenwald	8e76d26e5b	Move the `toRoman` helper function out of the `Util` scope Compared to all the other (static) methods in `Util`, the `toRoman` one looks slightly out of place. Even more so considering that `Util` is being exposed through `pdfjsLib`, where access to a Roman numerals conversion method doesn't make much sense.	2018-07-10 10:45:25 +02:00
Jonas Jenwald	c1c49badff	Remove the, now unused, `Util.inherit` helper function	2018-07-10 10:29:47 +02:00
Jonas Jenwald	2b25deb84c	Prevent errors in `sanitizeTTProgram`, during parsing of CALL functions, when encountering invalid functions stack deltas (bug 1473809) I was feeling bored; so this is a very quick, and somewhat naive, attempt at fixing the bug. The breaking error, i.e. `Error during font loading: invalid array length`, was thrown when attempting to re-size the `stack` to a negative length when parsing the CALL functions. Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1473809.	2018-07-10 09:45:55 +02:00
Jonas Jenwald	bf6d45f85a	Convert `CMap` and `IdentityCMap` to ES6 classes Also changes `var` to `let`/`const` in code already touched in the patch.	2018-07-09 21:12:01 +02:00
Jonas Jenwald	b773b356af	Convert `NameOrNumberTree`, `NameTree`, and `NumberTree` to ES6 classes Also changes `var` to `let`/`const` in code already touched in the patch.	2018-07-09 21:12:01 +02:00
Jonas Jenwald	ba1af46709	Convert `CompiledFont`, `TrueTypeCompiled`, and `Type2Compiled` to ES6 classes Also changes `var` to `let`/`const` in code already touched in the patch.	2018-07-09 21:12:01 +02:00
Jonas Jenwald	775763a091	Ensure that `CompiledFont.compileGlyph` always returns an Array (PR 6141 follow-up) PR 6141 changed `CompiledFont.compileGlyph` to, in the general case, return an Array. However, that PR apparenly forgot to update the no-glyph, empty-glyph, and endchar-glyph code-path and a String was still being (incorrectly) returned. Given the way that `FontFaceObject.getPathGenerator` (on the API side) is implemented, this shouldn't have caused any bugs despite the Worker possible returning unexpected data.	2018-07-09 21:12:01 +02:00
Tim van der Meij	646d81cd09	Merge pull request #9837 from timvandermeij/unreachable Replace `NotImplementedException` with `unreachable`	2018-07-09 21:10:36 +02:00
Tim van der Meij	907c7f190b	Convert `src/code/pdf_manager.js` to ES6 classes/syntax	2018-07-08 16:43:46 +02:00
Jonas Jenwald	a9ce4e8417	Stop exposing the `URL` polyfill in the global scope This moves/exposes the `URL` polyfill similarily to the existing `ReadableStream` polyfill, rather than exposing it globally, to avoid interfering with any "outside" code. Both the `URL` and `ReadableStream` polyfills are now exposed on the `pdfjsLib` object, such that they are accessible to the viewer components. Furthermore, the `no-restricted-globals` ESLint rule is also enabled to prevent accidental usage of the native `URL`/`ReadableStream` implementations directly in the `src/` and `web/` folders; see also https://eslint.org/docs/rules/no-restricted-globals Addresses the remaining TODO in https://github.com/mozilla/pdf.js/projects/6	2018-07-04 09:16:28 +02:00
Tim van der Meij	99f8f2c275	Merge pull request #9853 from Snuffleupagus/re-render-after-cancel Fix re-rendering, using the same canvas, when rendering was previously cancelled (PR 8519 follow-up)	2018-06-29 23:25:43 +02:00
Tim van der Meij	6fa2c779b5	Merge pull request #9838 from Snuffleupagus/invalid-path-OPS Error, rather than warn, once a number of invalid path operators are encountered in `EvaluatorPreprocessor.read` (bug 1443140)	2018-06-28 23:15:25 +02:00
Jonas Jenwald	bf0aca86d7	Fix re-rendering, using the same canvas, when rendering was previously cancelled (PR 8519 follow-up) Currently if `RenderTask.cancel` is called immediately after rendering was started, then by the time that `InternalRenderTask.initializeGraphics` is called rendering will already have been cancelled. However, we're still inserting the canvas into the `canvasInRendering` map, thus breaking any future attempts at re-rendering using the same canvas. Considering that `InternalRenderTask.cancel` always removes the canvas from the map, I cannot imagine that we'd ever want to re-add it after rendering was cancelled (it was likely just a simple oversight in PR 8519). Fixes 9456.	2018-06-28 22:56:37 +02:00
Tim van der Meij	14b69a4c1c	Merge pull request #9729 from Snuffleupagus/gulp-image_decoders Add a `gulp image_decoders` command to package the image decoders (i.e. jpg.js, jpx.js, jbig2.js) separately, and publish them in pdfjs-dist	2018-06-26 23:27:32 +02:00
Jonas Jenwald	74e9999044	Add unit-tests for `PDFPageProxy.stats` (PR 9245 follow-up) This wasn't included in PR 9245, since all the API options were still global at that time. Writing the unit-tests also uncovered an issue with `getOperatorList` not starting the "Page Request" timer.	2018-06-25 14:20:49 +02:00

... 4 5 6 7 8 ...

3580 Commits