5948 Commits

Author SHA1 Message Date
Tim van der Meij
283f2dfcc3
Merge pull request #10022 from janpe2/svg-Tr
Implement text rendering modes in SVG backend
2018-08-29 23:51:07 +02:00
Tim van der Meij
27ebb41b8f
Merge pull request #10020 from Snuffleupagus/addon-prefs-no-eslint
Ensure that the built `PdfJsDefaultPreferences.jsm` file won't be affected/touched during tree-wide ESLint rule changes in `mozilla-central` (PR 9571 follow-up)
2018-08-29 22:40:56 +02:00
Jonas Jenwald
d8aaa2f978 Update to the current year, i.e. 2018, in the bundle license headers 2018-08-28 23:46:56 +02:00
Jani Pehkonen
c426ea376c Implement text rendering modes in SVG backend 2018-08-29 00:42:07 +03:00
cheryly279
29c0ea159d Adding chunkname to async loaded code
Better name
2018-08-27 17:17:32 -04:00
Jonas Jenwald
95e5bad4c4 Attempt to find truncated endstream commands, in the fallback code-path, in Parser.makeStream (issue 10004)
Apparently there's some PDF generators, in this case the culprit is "Nooog Pdf Library / Nooog PStoPDF v1.5", that manage to mess up PDF creation enough that endstream[1] commands actually become truncated.

*Please note:* The solution implemented here isn't perfect, since it won't be able to cope with PDF files that contains a *mixture* of correct and truncated endstream commands.
However, considering that this particular mode of corruption *fortunately* doesn't seem very common[2], a slightly less complex solution ought to suffice for now.

Fixes 10004.

---
[1] Scanning through the PDF data to find endstream commands becomes necessary, in order to determine the stream length in cases where the `Length` entry of the (stream) dictionary is missing/incorrect.

[2] I cannot recall having seen any (previous) issues/bugs with "Missing endstream" errors.
2018-08-26 11:51:11 +02:00
Jonas Jenwald
c81cbe113c Extract the "scanning for endstream command" part of Parser.makeStream into a helper method
With this code now living in a separate method, it can be simplified slightly (e.g. by using early returns).
2018-08-26 11:51:09 +02:00
Tim van der Meij
436d2efa8a
Merge pull request #10007 from Snuffleupagus/ColorSpace-class
Convert the code in `src/core/colorspace.js to use ES6 classes
2018-08-25 18:45:40 +02:00
Tim van der Meij
4a0d15aa0e
Slightly simplify the catalog code 2018-08-25 16:40:59 +02:00
Tim van der Meij
aec236f6d8
Convert the Catalog class, in src/core/obj.js, to ES6 syntax 2018-08-25 16:38:22 +02:00
Jonas Jenwald
a182907592 Replace all occurences of var with let/const in src/core/colorspace.js 2018-08-25 03:20:21 +02:00
Jonas Jenwald
ce9a38c536 Convert the code in `src/core/colorspace.js to use ES6 classes
Reduces the amount of boilerplate code when defining the the sub-classes.

Please note that a couple of the closures were kept, since it's not (yet) possible to include helper functions inside of `class`es.
2018-08-25 03:20:19 +02:00
Jonas Jenwald
45b7b861b8 Remove the unused defaultColor property on ColorSpace instances
This property is not only completely unused now, it never actually appears to have been used. Even though the memory savings, from not initializing these extra typed arrays, won't be significant in the grand scheme of things it still seems completely unnecessary to keep allocating this data.

As far as I can tell, the main reason for the existence of `defaultColor` seem to be for documentation purposes. Hence the code is changed into comments instead, to keep the information around (but without the unnecessary allocations).
2018-08-23 11:16:52 +02:00
Jonas Jenwald
099ed08852 Add support for async/await using Babel
For proof-of-concept, this patch converts a couple of `Promise` returning methods to use `async` instead.
Please note that the `generic` build, based on this patch, has been successfully testing in IE11 (i.e. the viewer loads and nothing is obviously broken).

Being able to use modern JavaScript features like `async`/`await` is a huge plus, but there's one (obvious) side-effect: The size of the built files will increase slightly (unless `SKIP_BABEL == true`). That's unavoidable, but seems like a small price to pay in the grand scheme of things.

Finally, note that the `chromium` build target was changed to no longer skip Babel translation, since the Chrome extension still supports version `49` of the browser (where native `async` support isn't available).
2018-08-19 16:54:11 +02:00
Tim van der Meij
4ea663aa8a
Merge pull request #9987 from Snuffleupagus/rm-createBlob
[api-minor] Remove the obsolete `createBlob` helper function
2018-08-19 16:43:36 +02:00
Jonas Jenwald
75923ea515 Remove the unused PDFDocument.mainXRefEntriesOffset method
Not only is this method completely unused *now*, looking through the history of the code it never appears to have been used for anything either.
Years ago `mainXRefEntriesOffset` was included when creating `XRef` instances, however it wasn't actually used for anything (the parameter was never checked, nor assigned to a property on `XRef`).

If this method ever becomes useful (again) it's easy enough to restore it thanks to version control, but including dead code in the builds just seems wasteful.
2018-08-19 14:08:39 +02:00
Jonas Jenwald
50a47be190 [api-minor] Remove the obsolete createBlob helper function
At this point in time, all supported browsers have native support for `Blob`; please see https://developer.mozilla.org/en-US/docs/Web/API/Blob/Blob#Browser_compatibility.
Furthermore, note how the helper function was throwing an error if `Blob` isn't available anyway.
2018-08-19 13:37:19 +02:00
Jonas Jenwald
497b765ede Attempt to combine separate beginText/endText sequences in getTextContent (issue 9984)
Please note that while this *improves* issue 9984 slightly (and likely others too), it's not a complete solution.
The remaining issues are related to the, more general, problems with the existing heuristics related to attempting to combine separate text items.
2018-08-18 13:45:32 +02:00
Jonas Jenwald
bc89edb8f0 Ensure that Uint8ClampedArray is used for image data transfered by getTransfers (PR 9802 follow-up)
One of the `QueueOptimizer` cases wasn't updated to use `Uint8ClampedArray`s, which leads to inconsistent image data on the API side (but no actual rendering bugs, as far as I can tell).
To prevent future errors, a non-production/test-only `assert` was added to ensure that the relevant image data only uses `Uint8ClampedArray`s.
2018-08-16 10:29:44 +02:00
Tim van der Meij
1268aea2b6
Merge pull request #9975 from Snuffleupagus/getDestination-refactor
Re-factor `destinations`/`getDestination` to reduce unnecessary duplication, and reject non-string inputs
2018-08-12 15:51:58 +02:00
Tim van der Meij
af19ed6ee9
Merge pull request #9822 from timvandermeij/annotations
[api-minor] Refactor the annotation code to be asynchronous
2018-08-11 20:39:50 +02:00
dmitryskey
3741becb9b
[api-minor] Refactor the annotation code to be asynchronous
This commit is the first step towards implementing parsing for the
appearance streams of annotations.

Co-authored-by: Jonas Jenwald <jonas.jenwald@gmail.com>
Co-authored-by: Tim van der Meij <timvandermeij@gmail.com>
2018-08-11 19:00:29 +02:00
Jonas Jenwald
1179584fd6 Reject getDestination, in the API, for non-string inputs
Note how e.g. the `getPage` method does basic validation of the input.
2018-08-11 16:06:35 +02:00
Jonas Jenwald
b74c813353 Re-factor destinations/getDestination, in the Catalog, to reduce unnecessary duplication
Currently, these two methods contain the same boilerplate code for getting the /Dests data.
2018-08-11 16:04:58 +02:00
Jonas Jenwald
06d1ff5af4 Tweak the MMType1 font detection in getFontFileType to improve font telemetry (PR 9961 follow-up)
Please note that this patch does *not* affect rendering in any way, however it's relevant for font telemetry[1].

According to the specification, see https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G8.1904956, Type1C is a valid subtype for *both* Type1 and MMType1 fonts.

---
[1] Refer to the font telemetry results in https://telemetry.mozilla.org/new-pipeline/dist.html#!cumulative=0&end_date=2018-06-25&keys=__none__!__none__!__none__&max_channel_version=nightly%252F62&measure=PDF_VIEWER_FONT_TYPES&min_channel_version=nightly%252F59&processType=*&product=Firefox&sanitize=1&sort_keys=submissions&start_date=2018-05-07&table=0&trim=1&use_submission_date=0

See also https://github.com/mozilla/pdf.js/wiki/Enumeration-Assignments-for-the-Telemetry-Histograms#pdf_viewer_font_types for help with interpreting the data.
2018-08-08 12:18:37 +02:00
Jonas Jenwald
f78efd883e Attempt to throw MissingPDFException when applicable in node_stream.js (issue 9791) 2018-08-06 10:00:03 +02:00
Tim van der Meij
4111871ac5
Merge pull request #9958 from brendandahl/always-fallback
Always fallback to system font on font failure.
2018-08-05 19:58:48 +02:00
Jonas Jenwald
3177f6aa55 Parse the font file to determine the correct type/subtype, rather than relying on the (often incorrect) data in the font dictionary
The current font type/subtype detection code is quite inconsistent/unwieldy. In some cases it will simply assume that the font dictionary is correct, in others it will somewhat "arbitrarily" check the actual font file (more of these cases have been added over the years to fix specific bugs).

As is evident from e.g. issue 9949, the font type/subtype detection code is continuing to cause issues. In an attempt to get rid of these hacks once and for all, this patch instead re-factors the type/subtype detection to *always* parse the font file.

Please note that, as far as I can tell, we still appear to need to rely on the composite font detection based on the font dictionary. However, even if the composite/non-composite detection would get it wrong, that shouldn't really matter too much given that there's basically only two different code-paths (for "TrueType-like" vs "Type1-like" fonts).
2018-08-05 11:13:16 +02:00
Jonas Jenwald
9bbca04579 Add a (basic) isCFFFile helper function to detect CFF font files
Compared to most other font formats, the CFF doesn't have a constant header which makes is slightly more difficult to detect such font files.

Please refer to the Compact Font Format specification: https://www.adobe.com/content/dam/acom/en/devnet/font/pdfs/5176.CFF.pdf#G3.32094
2018-08-05 11:13:14 +02:00
Jonas Jenwald
f4db38aadf Update the TrueType font file detection to also recognize the Mac specific header 'true'
Please refer to the TrueType specification: https://developer.apple.com/fonts/TrueType-Reference-Manual/RM06/Chap6.html#ScalerTypeNote
2018-08-05 10:33:56 +02:00
Brendan Dahl
5f67a6a237 Always fallback to system font on font failure.
The font in the PDF is marked as a CIDFontType0, but the font file is
actually a true type font. To fully address this issue we should really
peek into the font file and try to determine what it is. However, this
is the first case of this issue, so I think this solution is acceptable for
now.
2018-08-03 16:49:22 -07:00
Tim van der Meij
f19ee127a3
Merge pull request #9874 from boundlesshq/master
[api-minor] Include export value for checkboxes
2018-08-03 23:43:23 +02:00
Jonas Jenwald
a504befc76 Stop warning for non-Name /Filter entries in the PDFImage constructor (PR 9897 follow-up)
Fixes a stupid oversight on my part, since /Filter may (obviously) contain an Array, which resulted in unnecessary console warning spam in perfectly valid PDF files.
Note that it still makes sense to check that /Filter is actually a Name, before attempting to access its `name` property, but the warning should definitely be removed.
2018-08-03 10:23:08 +02:00
Brian
2a665ebad4 Removed Extraneous Matrix Check in CalRGB Conversion 2018-08-02 10:16:42 -07:00
Tim van der Meij
716acf63d4
Merge pull request #9938 from Snuffleupagus/issue-9915
Ensure that Type0, i.e. composite, OpenType fonts with `CFF ` tables are *not* treated as CFF fonts if their glyph mapping is non-default (issue 9915)
2018-08-02 00:11:18 +02:00
Jonas Jenwald
3ce420131f Prefer the Width/Height of the image data, rather than the image dictionary, for JPEG 2000 images (issue 9650)
According to the PDF specification, see https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#page=45
> When using the JPXDecode filter with image XObjects, the following changes to and constraints on some entries in the image dictionary shall apply (see 8.9.5, "Image Dictionaries" for details on these entries):
>
>  - Width and Height shall match the corresponding width and height values in the JPEG2000 data.
>
>  - . . .

Hence it seems reasonable to use the Width/Height of the image data *itself*, rather than the image dictionary when there's a mismatch. Given that JPEG 2000 images are already being parsed, in order to obtain basic parameters, the actual Width/Height is readily available in the `PDFImage` constructor.
2018-08-01 16:42:26 +02:00
Jonas Jenwald
17f65908ae Add more validation of the /Filter entry, in image dictionaries, to the PDFImage constructor
Given that the code is currently assuming that the /Filter entry is a `Name`, it cannot hurt to actually ensure that's the case.

Also fixes an error message, for JPEG 2000 images with unsupported ColorSpaces, since `this.numComps` hasn't been initialized when it's accessed during the `throw new Error()` invocation.
2018-08-01 16:41:15 +02:00
Jonas Jenwald
17eac2d48a Ensure that Type0, i.e. composite, OpenType fonts with CFF tables are *not* treated as CFF fonts if their glyph mapping is non-default (issue 9915)
This particular code-path has been the source of *numerous* regressions to date, so hopefully this patch won't cause any more of those.

Fixes 9915.
2018-07-29 23:06:15 +02:00
Jonas Jenwald
cfdb597e4a Ensure that the CIDSystemInfo strings, in Type0 fonts, are correctly decoded
This isn't directly related to the subsequent patch, but just something that I happened to notice while poking around in the font code.
2018-07-29 23:06:15 +02:00
Tim van der Meij
3521424576
Merge pull request #9920 from Snuffleupagus/getMetadata-linearization
[api-minor] Add an `IsLinearized` property to the `PDFDocument.documentInfo` getter, to allow accessing the linearization status through the API (via `PDFDocumentProxy.getMetadata`)
2018-07-29 20:23:22 +02:00
Tim van der Meij
f45450bd78
Merge pull request #9931 from Snuffleupagus/refactor-getPage
Refactor `getPage` (in the worker), and attempt to use the `Linearization` dictionary to lookup the first Page
2018-07-29 19:33:46 +02:00
Tim van der Meij
a2c317f12b
Merge pull request #9925 from Snuffleupagus/StreamsSequenceStream-maybeLength
Attempt to estimate the minimum required `buffer` length when initializing `StreamsSequenceStream` instances
2018-07-29 16:52:34 +02:00
Jonas Jenwald
ec3728b540 Use the Linearization dictionary, if it exists, when fetching the first Page
Since PDF.js already supports range requests and streaming, not to mention chunked rendering, attempting to use the `Linearization` dictionary in `PDFDocument.getPage` probably isn't going to improve performance in any noticeable way.
Nonetheless, when `Linearization` data is available, it will allow looking up the first Page *directly* without having to descend into the `Pages` tree to find the correct object.
2018-07-28 22:23:36 +02:00
Jonas Jenwald
fbb25ff4e2 Move getPage, on the worker side, from Catalog and into PDFDocument instead
Addresses an existing TODO, and avoids having to pass in a `pageFactory` when creating `Catalog` instances.
2018-07-28 22:23:36 +02:00
Jonas Jenwald
81b471c781 [Regression] Convert Catalog.builtInCMapCache into a Map, instead of an Object, to ensure that it's correctly reset (PR 8064 follow-up)
With the `builtInCMapCache` being a simple Object, it unfortunately means that the `Catalog.cleanup` method isn't resetting it as intended.
By just replacing the `builtInCMapCache` with an empty Object, existing references to it will not actually be updated. The result is that e.g. `Page` instances still keeps references to, what should have been removed, CMap data.

To fix these problems, the `builtInCMapCache` is converted into a `Map` instead (since it can be easily reset).
2018-07-28 22:20:43 +02:00
bion
c31ddf7edc [api-minor] Include export value for checkboxes 2018-07-28 00:30:41 -07:00
Jonas Jenwald
928b89382e [api-minor] Add an IsLinearized property to the PDFDocument.documentInfo getter, to allow accessing the linearization status through the API (via PDFDocumentProxy.getMetadata)
There was a (somewhat) recent question on IRC about accessing the linearization status of a PDF document, and this patch contains a simple way to expose that through already existing API methods.
Please note that during setup/parsing in `PDFDocument` the linearization data is already being fetched and parsed, provided of course that it exists. Hence this patch will *not* cause any additional data to be loaded.
2018-07-26 15:54:19 +02:00
Jonas Jenwald
8a4466139b Simplify the DocumentInfoValidators definition
With this file now being a proper (ES6) module, it's no longer (technically) necessary for this structure to be lazily initialized. Considering its size, and simplicity, I therefore cannot see the harm in letting `DocumentInfoValidators` just be simple Object instead.

While I'm not aware of any bugs caused by the current code, it cannot hurt to add an `isDict` check in `PDFDocument.documentInfo` (since the current code assumes that `infoDict` being defined implies it also being a Dictionary).

Finally, the patch also converts a couple of `var` to `let`/`const`.
2018-07-26 15:54:01 +02:00
Jonas Jenwald
2d51bce941 Remove unnecessary stream.length check from PDFDocument.linearization
Note first of all that `PDFDocument` will be initialized with either a `Stream` or a `ChunkedStream`, and that both of these have `length` getters. Secondly, the `PDFDocument` constructor will assert that the `stream` has a non-zero (and positive) length. Hence there's no point in checking `stream.length` in the `linearization` getter.
2018-07-26 15:54:01 +02:00
Jonas Jenwald
32bfa55d98 Attempt to estimate the minimum required buffer length when initializing StreamsSequenceStream instances
For most other `DecodeStream` based streams, we'll attempt to estimate the minimum `buffer` length based on the raw stream data. The purpose of this is to avoid having to unnecessarily re-size the `buffer`, thus reducing the number of *intermediate* allocations necessary when decoding the stream data.
However, currently no such optimization is attempted for `StreamsSequenceStream`, and given that they can often be quite large that seems unfortunate. To improve this, at least somewhat, this patch utilizes the raw sizes of the `StreamsSequenceStream` sub-streams to estimate the minimum required `buffer` length.

Most likely this patch won't have a huge effect on memory consumption, however for pathological cases it should help reduce peak memory usage slightly.
One example is the PDF file in issue 2813, where currently the `StreamsSequenceStream` instances would grow their `buffer`s as `2 MiB -> 4 MiB -> 8 MiB -> 16 MiB -> 32 MiB`. With this patch, the same stream `buffers`s grow as `8 MiB -> 16 MiB -> 32 MiB`, thus avoiding a total of `12 MiB` of *intermediate* allocations (since there's two `StreamsSequenceStream` used, for rendering/text-extraction).
2018-07-26 13:42:59 +02:00