pdf.js

Author	SHA1	Message	Date
Tim van der Meij	09df1ee0ce	Include a reduced, non-linked PDF file for the attachments API unit test	2019-08-25 15:14:57 +02:00
Jonas Jenwald	711040ecc5	Stop re-throwing errors in the 'GetOperatorList' and 'GetTextContent' handlers, in `src/core/worker.js` These functions aren't returning anything, now that they're using `ReadableStream`s, and it thus doesn't seem necessary to re-throw errors (also given the console message that's caused by it).	2019-08-24 15:56:41 +02:00
Yury Delendik	66e0dd1b06	Use streams for OperatorList chunking (issue 10023) Please note: The majority of this patch was written by Yury, and it's simply been rebased and slightly extended to prevent issues when dealing with `RenderingCancelledException`. By leveraging streams this (finally) provides a simple way in which parsing can be aborted on the worker-thread, which will ultimately help save resources. With this patch worker-thread parsing will only be aborted when the document is destroyed, and not when rendering is cancelled. There's a couple of reasons for this: - The API currently expects the entire OperatorList to be extracted, or an Error to occur, once it's been started. Hence additional re-factoring/re-writing of the API code will be necessary to properly support cancelling and re-starting of OperatorList parsing in cases where the `lastChunk` hasn't yet been seen. - Even with the above addressed, immediately cancelling when encountering a `RenderingCancelledException` will lead to worse performance in e.g. the default viewer. When zooming and/or rotation of the document occurs it's very likely that `cancel` will be (almost) immediately followed by a new `render` call. In that case you'd obviously not want to abort parsing on the worker-thread, since then you'd risk throwing away a partially parsed Page and thus be forced to re-parse it again which will regress perceived performance. - This patch is already somewhat risky, given that it touches fundamentally important/critical code, and trying to keep it somewhat small should hopefully reduce the risk of regressions (and simplify reviewing as well). Time permitting, once this has landed and been in Nightly for awhile, I'll try to work on the remaining points outlined above. Co-Authored-By: Yury Delendik <ydelendik@mozilla.com> Co-Authored-By: Jonas Jenwald <jonas.jenwald@gmail.com>	2019-08-24 15:56:40 +02:00
Tim van der Meij	fbe8c6127c	Merge pull request #11059 from Snuffleupagus/boundingBox-more-validation Fallback gracefully when encountering corrupt PDF files with empty /MediaBox and /CropBox entries	2019-08-09 22:39:01 +02:00
Jonas Jenwald	d637b25e36	Fallback gracefully when encountering corrupt PDF files with empty /MediaBox and /CropBox entries This is based on a real-world PDF file I encountered very recently[1], although I'm currently unable to recall where I saw it. Note that different PDF viewers handle these sort of errors differently, with Adobe Reader outright failing to render the attached PDF file whereas PDFium mostly handles it "correctly". The patch makes the following notable changes: - Refactor the `cropBox` and `mediaBox` getters, on the `Page`, to reduce unnecessary duplication. (This will also help in the future, if support for extracting additional page bounding boxes are added to the API.) - Ensure that the page bounding boxes, i.e. `cropBox` and `mediaBox`, are never empty to prevent issues/weirdness in the viewer. - Ensure that the `view` getter on the `Page` will never return an empty intersection of the `cropBox` and `mediaBox`. - Add an optional parameter to `Util.intersect`, to allow checking that the computed intersection isn't actually empty. - Change `Util.intersect` to have consistent return types, since Arrays are of type `Object` and falling back to returning a `Boolean` thus seem strange. --- [1] In that case I believe that only the `cropBox` was empty, but it seemed like a good idea to attempt to fix a bunch of related cases all at once.	2019-08-09 10:18:13 +02:00
Jonas Jenwald	0f78fdb229	Handle some corrupt/truncated JPEG images that are missing the EOI (End of Image) marker (issue 11052) Note that even Adobe Reader cannot render the PDF file completely, which is always a good indication that it's corrupt.	2019-08-08 10:37:41 +02:00
Jonas Jenwald	5ac9c7c384	Support corrupt PDF files with invalid/non-existent Group /CS entries (issue 11045) The PDF file in question tries to reference a non-existent ColorSpace, which should be quite rare in practice.	2019-08-06 14:33:05 +02:00
Tim van der Meij	be70ee236d	Merge pull request #11013 from timvandermeij/annotations-quadpoints [api-minor] Implement quadpoints for annotations in the core layer	2019-08-04 16:06:10 +02:00
Jonas Jenwald	0276385e6e	[api-minor] Fix completely broken `getStats` method by returning stats in Objects, rather than in Arrays (PR 11029 follow-up) With the changes to the `StreamType`/`FontType` "enums" in PR 11029, one unfortunate result is that `getStats` now always returns empty Arrays. Something that everyone, myself included, apparently missed is that you obviously cannot index an Array with Strings :-) I wrongly assumed that the unit-tests would catch any bugs, but they apparently suffered from the same issue as the code in `src/core/`. Another possible option could perhaps be to use `Set`s, rather than objects, but that will require larger changes since `LoopbackPort` (in `src/display/api.js`) doesn't support them.	2019-08-02 14:09:24 +02:00
Jonas Jenwald	a3150166ec	Ensure that `ReadableStream`s are cancelled with actual Errors There's a number of spots in the current code, and tests, where `cancel` methods are not called with appropriate arguments (leading to Promises not being rejected with Errors as intended). In some cases the cancel `reason` is implicitly set to `undefined`, and in others the cancel `reason` is just a plain String. To address this inconsistency, the patch changes things such that cancelling is done with `AbortException`s everywhere instead.	2019-08-01 16:40:46 +02:00
Tim van der Meij	d909b86b28	Merge pull request #11020 from Snuffleupagus/issue-11016 Add a work-around, in `glyphlist.js`, for bad PDF generators which use a non-standard `/f_f` string in the `Encoding` dictionary when referring to the ff ligature (issue 11016)	2019-07-31 23:33:34 +02:00
wangsongyan	c61205d980	decode filename when match an urlencode filename from contentDispositionFilename	2019-07-31 09:33:56 +08:00
Jonas Jenwald	9ad50521b1	Add a work-around, in `glyphlist.js`, for bad PDF generators which use a non-standard `/f_f` string in the `Encoding` dictionary when referring to the ff ligature (issue 11016) This patch will not incur any (measurable) overhead, since the glyphlist is already quite long and one more entry won't really matter, which is important given that this sort of PDF corruption ought to be very rare. Furthermore, this patch purposely does not add a bunch of similarly modified ligature names on pure speculation. Any similar additions, for other ligatures, should only be made if there's real-world examples of PDF files where that's actually necessary.	2019-07-30 17:06:58 +02:00
Tim van der Meij	9114004d5b	[api-minor] Implement quadpoints for annotations in the core layer	2019-07-28 20:36:21 +02:00
Jonas Jenwald	ff90aa4323	Inline the `isCmd` check in the `Parser.shift` method For very large and complex PDF files this will help performance slightly, since `Parser.shift` is called a lot during parsing. This patch was tested using the PDF file from issue 2618, i.e. http://bugzilla-attachments.gnome.org/attachment.cgi?id=226471 (with well over four million `Parser.shift` calls for just the one page), using the following manifest file: ``` [ { "id": "issue2618", "file": "../web/pdfs/issue2618.pdf", "md5": "", "rounds": 100, "type": "eq" } ] ``` This gave the following results when comparing this patch against the `master` branch: ``` -- Grouped By browser, stat -- browser \| stat \| Count \| Baseline(ms) \| Current(ms) \| +/- \| % \| Result(P<.05) ------- \| ------------ \| ----- \| ------------ \| ----------- \| --- \| ----- \| ------------- Firefox \| Overall \| 100 \| 3386 \| 3322 \| -65 \| -1.92 \| faster Firefox \| Page Request \| 100 \| 1 \| 1 \| 0 \| -8.08 \| Firefox \| Rendering \| 100 \| 3385 \| 3321 \| -65 \| -1.92 \| faster ```	2019-07-22 12:07:36 +02:00
Tim van der Meij	6e96a158f4	Merge pull request #10820 from vlastimilmaca/annot-irt-rt-states Annotations - Added parsing of IRT, RT, State and StateModel	2019-07-17 23:34:31 +02:00
vlastimilmaca	fe49f0f766	Annotations - Implement parsing of IRT, RT, State and StateModel	2019-07-16 23:33:07 +02:00
Jonas Jenwald	c7de6dbe41	Update the `fingerprint` API unit-tests to explicitly check for the expected result The current tests won't catch inadvertent changes to the logic used to obtain/compute the document `fingerprint`.	2019-07-15 11:19:17 +02:00
Jonas Jenwald	c7fb7116d6	Add an API unit-test for the `stopAtErrors` option (PRs 8240 and 8922 follow-up) Also fixes an inconsistency in the 'PageError' handler, for `getOperatorList`, in the API.	2019-07-13 16:06:05 +02:00
Tim van der Meij	e3496041b5	Merge pull request #10950 from monchouchou/master Fixed testing webserver to handle paths correctly on Windows	2019-07-12 23:05:37 +02:00
Tim van der Meij	ed3954fc7a	Merge pull request #10851 from brendandahl/shading-bbox Apply bounding box before using shading patterns.	2019-07-12 22:52:07 +02:00
Tim van der Meij	87f36e3520	Merge pull request #10850 from brendandahl/scale-line-width Scale stroking line width when using a tiling pattern.	2019-07-12 22:50:32 +02:00
Brendan Dahl	6fab0a0dac	Apply bounding box before using shading patterns. Fixes #8092	2019-07-08 14:05:48 -07:00
Brendan Dahl	446efab707	Scale stroking line width when using a tiling pattern.	2019-07-08 13:47:54 -07:00
alephneo	f861d5c0d4	Fixed test/webserver to handle paths correctly on Windows	2019-07-07 02:42:50 +05:30
Tim van der Meij	f1867de492	Merge pull request #10925 from Snuffleupagus/eslint_no-unsanitized Enable the `eslint-plugin-no-unsanitized` ESLint plugin to disallow unsafe usage of e.g. `innerHTML`	2019-06-27 20:32:24 +02:00
Jonas Jenwald	f710eb56e4	Change the signature of the `Parser` constructor to take a parameter object A lot of the `new Parser()` call-sites look quite unwieldy/ugly as-is, with a bunch of somewhat randomly ordered arguments, which we can avoid by changing the constructor to accept an object instead. As an added bonus, this provides better documentation without having to add inline argument comments in the code.	2019-06-23 16:01:45 +02:00
Jonas Jenwald	5bb5e7741d	Enable the `eslint-plugin-no-unsanitized` ESLint plugin to disallow unsafe usage of e.g. `innerHTML` See https://github.com/mozilla/eslint-plugin-no-unsanitized Since we've generally never allowed e.g. `innerHTML`, which is enforced during review, there's only one linting failure with this patch. (Which is white-listed, according to the existing comment and the fact that it's test-only code.)	2019-06-23 13:50:30 +02:00
Jonas Jenwald	876c962235	Ignore Annotations with too large border `width`s, to prevent the `annotationLayer` from rendering it over the surrounding document (bug 1552113) The border `width` will instead fallback to the default value of `1`, rather than ignoring it altoghether, to also ensure that e.g. `LinkAnnotation`s become clickable as intended. Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1552113	2019-06-01 15:51:22 +02:00
Jonas Jenwald	2fe9f3ff8f	Add caching to reduce the number of `Ref` objects This is similar to the existing caching used to reduced the number of `Cmd` and `Name` objects. With the `tracemonkey.pdf` file, this patch changes the number of `Ref` objects as follows (in the default viewer): \| \| Loading the first page \| Loading all the pages \| \|----------\|------------------------\|-------------------------\| \| `master` \| 332 \| 3265 \| \| `patch` \| 163 \| 996 \|	2019-05-26 12:23:37 +02:00
Tim van der Meij	bc1eb49a77	Implement creation date only for markup annotations The specification states that `CreationDate` is only available for markup annotations instead of for all annotation types. Moreover, popup annotations are not markup annotations according to the specification, so the creation date inheritance from the parent annotation is also removed there (note that only the modification date is used in e.g., the viewer).	2019-05-25 15:31:06 +02:00
Tim van der Meij	cf07918ccb	Implement contents for every annotation type The specification states that `Contents` can be available for every annotation types instead of only for markup annotations.	2019-05-18 15:52:17 +02:00
Tim van der Meij	c8c937c257	Merge pull request #10794 from janpe2/cidtogidmap-zero Fix glyph at index zero in CIDFontType2 that has a CIDToGIDMap stream	2019-05-15 00:04:39 +02:00
Jonas Jenwald	173fbef05b	Enable the `consistent-return` ESLint rule This rule is already enabled in mozilla-central, and helps ensure more consistent functions/methods, see https://searchfox.org/mozilla-central/rev/b9da45f63cb567244933c77b2c7e827a057d3f9b/tools/lint/eslint/eslint-plugin-mozilla/lib/configs/recommended.js#119-120 Please see https://eslint.org/docs/rules/consistent-return for additional information.	2019-05-11 14:27:21 +02:00
Jonas Jenwald	57ad3a5acb	Fuzzy match in the `should parse PostScript numbers` unit-test, to work-around rounding bugs in Chromium browsers	2019-05-08 14:01:10 +02:00
Jani Pehkonen	05c527f035	Fix glyph 0 in CIDFontType2 that has a CIDToGIDMap stream	2019-05-07 18:44:37 +03:00
Tim van der Meij	be1d6626a7	Implement creation/modification date for annotations This includes the information in the core and display layers. The date parsing logic from the document properties is rewritten according to the specification and now includes unit tests. Moreover, missing unit tests for the color of a popup annotation have been added. Finally the styling of the popup is changed slightly to make the text a bit smaller (it's currently quite large in comparison to other viewers) and to make the drop shadow a bit more subtle. The former is done to be able to easily include the modification date in the popup similar to how other viewers do this.	2019-05-05 14:51:03 +02:00
Jonas Jenwald	5335285cda	Attempt to handle corrupt PDF documents that contains path operators inside of text object (issue 10542) First of all, while this simple approach appears to work OK in practice I'm not sure if it's the best way of addressing the problem (assuming that you even want to). Second of all, while the solution implemented here only requires tracking/checking one new boolean in order for this to work, I'm nonetheless not entirely happy about this since it will add additional overhead (albeit very small) to the parsing of path operators in PDF documents just for a handful of corrupt ones.	2019-04-30 23:35:33 +02:00
Tim van der Meij	762c58e0fc	Merge pull request #10738 from Snuffleupagus/ViewerPreferences-api [api-minor] Add support for ViewerPreferences in the API (issue 10736)	2019-04-20 18:39:32 +02:00
Jonas Jenwald	34952b732e	Add a `getDocId` method to the `idFactory`, in `Page` instances, to avoid passing around `PDFManager` instances unnecessarily (PR 7941 follow-up) This way we can avoid manually building a "document id" in multiple places in `evaluator.js`, and it also let's us avoid passing in an otherwise unnecessary `PDFManager` instance when creating a `PartialEvaluator`.	2019-04-20 13:11:17 +02:00
Tim van der Meij	55d9b35d37	Merge pull request #10727 from Snuffleupagus/type3-image-resources Support (rare) Type3 fonts which contains image resources (issue 10717)	2019-04-18 23:07:26 +02:00
Jonas Jenwald	311bac3ebb	[api-minor] Add support for ViewerPreferences in the API (issue 10736) Please see the specification, https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#M11.9.12864.1Heading.71.Viewer.Preferences Furthermore, note that this patch only adds API support and unit-tests but does not attempt to integrate e.g. the `ViewerPreferences -> Direction` property into the viewer (which would be necessary to address issue 10736). The reason for this is that it's not entirely clear to me exactly if/how that could be implemented; e.g. would it be as simple as setting the `dir` attribute on the `viewerContainer` DOM element, or will it be more complicated? There's also the question of how the `ViewerPreferences -> Direction` value interacts with the `PageMode`, and this will generally require a fair bit of manual testing. Since the direction of the entire viewer depends on the browser locale, there's also a somewhat open question regarding what default value to use for different locales. Finally, if the viewer supports `ViewerPreferences -> Direction` then I'm assuming that it will be necessary to allow users to override the default value, which will require (most likely) new `SecondaryToolbar` buttons and icons for those etc. Hence this patch only lays the necessary foundation for eventually addressing issue 10736, but defers the actual implementation until later. (Time permitting, I'll try to look into the viewer part later.)	2019-04-14 14:20:52 +02:00
Tim van der Meij	ae2a4dc3dd	Implement free text annotations	2019-04-13 18:45:22 +02:00
Jonas Jenwald	be604bd195	Support (rare) Type3 fonts which contains image resources (issue 10717) The Type3 font type is not commonly used in PDF documents, as can be seen from telemetry data such as: https://telemetry.mozilla.org/new-pipeline/dist.html#!cumulative=0&end_date=2019-04-09&include_spill=0&keys=__none__!__none__!__none__&max_channel_version=nightly%252F68&measure=PDF_VIEWER_FONT_TYPES&min_channel_version=nightly%252F57&processType=&product=Firefox&sanitize=1&sort_by_value=0&sort_keys=submissions&start_date=2019-03-18&table=0&trim=1&use_submission_date=0 (see also https://github.com/mozilla/pdf.js/wiki/Enumeration-Assignments-for-the-Telemetry-Histograms#pdf_viewer_font_types). Type3 fonts containing image resources are very* rare in practice, usually they only contain path rendering operators, but as the issue shows they unfortunately do exist. Currently these Type3-related image resources are not handled in any special way, and given that fonts are document rather than page specific rendering breaks since the image resources are thus not available to the entire document. Fortunately fixing this isn't too difficult, but it does require adding a couple of Type3-specific code-paths to the `PartialEvaluator`. In order to keep the implementation simple, particularily on the main-thread, these Type3 image resources are completely decoded on the worker-thread to avoid adding too many special cases. This should not cause any issues, only marginally less efficient code, but given how rare this kind of Type3 font is adding premature optimizations didn't seem at all warranted at this point.	2019-04-13 18:27:50 +02:00
Mukul Mishra	02e46d22d2	Add fetch stream spec	2019-04-07 13:14:03 +02:00
Jonas Jenwald	7a999d1d67	[api-minor] Add basic support for PageLayout in the API and the viewer Please see the specification, https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G6.2393749, and refer to the inline comments for additional details.	2019-04-05 11:32:01 +02:00
Tim van der Meij	072c5864fb	Merge pull request #10675 from Snuffleupagus/PDFDataTransportStream-disableRange [Firefox regression] Fix `disableRange=true` bug in `PDFDataTransportStream`	2019-04-04 23:07:45 +02:00
Tim van der Meij	b4c3b94592	Merge pull request #6606 from Rob--W/pattern-scaling Improve performance and correctness of Tiling Patterns	2019-03-29 00:01:38 +01:00
Tim van der Meij	f9c58115fc	Merge pull request #10683 from janpe2/type0-noncid-cmap Use CMap in Type0 fonts when CFF is not a CID font	2019-03-28 00:07:08 +01:00
Rob Wu	d3dc8f16b5	TilingPattern: Reverse transform after painting This transform resulted in an incorrectly positioned object when the bounding box's upper-left corner did not start at (0,0), because the translation was not reverted. This patch adds the missing transform. The test file (tiling-pattern-box.pdf) is based on the PDF from #2825. All but the first cube (including the PDF data) have been removed. To trigger the bug that is fixed by this commit, I changed the BBox of the first pattern from "[ 0 0 596 842]" to "[90 0 596 842]". Without this patch, the dashed vertical line that intersects the corners at A and E would disappear.	2019-03-27 17:50:35 +01:00

... 3 4 5 6 7 ...

2090 Commits