pdf.js

Author	SHA1	Message	Date
Jonas Jenwald	ca8d2bdce4	Abort parsing when the XRef /W-array contain bogus entries (issue 14303) For this particular PDF document, we have `/W [1 2 166666666666666666666666666]` which obviously makes no sense. While this patch makes no attempt at actually validating the entries in the /W-array, we'll now simply abort all processing when the end of the PDF document has been reached (thus preventing hanging the browser). Please note that this patch doesn't enable the PDF document to be loaded/rendered, but at least it fails "correctly" now. Fixes one of the issues listed in issue 14303, namely the `REDHAT-1531897-0.pdf`document.	2021-11-25 18:35:08 +01:00
Jonas Jenwald	ae4f1ae3e7	Ensure that `ChunkedStream` won't attempt to request data beyond the document size (issue 14303) This bug was surprisingly difficult to track down, since it didn't just depend on range-requests being used but also on how quickly the document was loaded. To even be able to reproduce this locally, I had to use a very small `rangeChunkSize`-value (note the unit-test). The cause of this bug is a bogus entry in the XRef-table, causing us to attempt to request data from beyond the actual document size and thus getting into an infinite loop. Fixes one of the issues listed in issue 14303, namely the `PDFBOX-4352-0.pdf` document.	2021-11-24 19:19:43 +01:00
Jonas Jenwald	0ebac67a9f	Remove the `{BaseViewer, PDFThumbnailViewer}._pagesRequests` caches In the `BaseViewer` this cache is mostly relevant in the `disableAutoFetch = true` mode, since the pages are being initialized lazily in that case. In the `PDFThumbnailViewer` this cache is mostly used for thumbnails that are actually being rendered, as opposed to those created directly from the "regular" pages. Please note that I'm not suggesting that we remove these caches because they're only used in some situations, but rather because they're for all intents and purposes actually redundant. In the API itself, we're already caching both the page-promises and the actual pages themselves on the `WorkerTransport`-instance. Hence these viewer-caches aren't really necessary in practice, and adds what to me mostly seems like an unnecessary level of indirection.[1] Given that the viewer now relies on caching in the API itself, this patch also adds a new unit-test to ensure that page-caching works (and keep working) as expected. --- [1] In the `WorkerTransport.getPage`-method the parameter is being validated on every call, but that's hardly enough code to warrant keeping the "duplicate" caches in the viewer in my opinion.	2021-11-21 11:40:45 +01:00
Jonas Jenwald	6da0944fc7	[api-minor] Replace `PDFDocumentProxy.getStats` with a synchronous `PDFDocumentProxy.stats` getter Please note: These changes will primarily benefit longer documents, somewhat at the expense of e.g. one-page documents. The existing `PDFDocumentProxy.getStats` function, which in the default viewer is called for each rendered page, requires a round-trip to the worker-thread in order to obtain the current document stats. In the default viewer, we currently make one such API-call for every rendered page. This patch proposes replacing that method with a synchronous `PDFDocumentProxy.stats` getter instead, combined with re-factoring the worker-thread code by adding a `DocStats`-class to track Stream/Font-types and only send them to the main-thread the first time that a type is encountered. Note that in practice most PDF documents only use a fairly limited number of Stream/Font-types, which means that in longer documents most of the `PDFDocumentProxy.getStats`-calls will return the same data.[1] This re-factoring will obviously benefit longer document the most[2], and could actually be seen as a regression for one-page documents, since in practice there'll usually be a couple of "DocStats" messages sent during the parsing of the first page. However, if the user zooms/rotates the document (which causes re-rendering), note that even a one-page document would start to benefit from these changes. Another benefit of having the data available/cached in the API is that unless the document stats change during parsing, repeated `PDFDocumentProxy.stats`-calls will return the same identical object. This is something that we can easily take advantage of in the default viewer, by now only reporting "documentStats" telemetry[3] when the data actually have changed rather than once per rendered page (again beneficial in longer documents). --- [1] Furthermore, the maximium number of `StreamType`/`FontType` are `10` respectively `12`, which means that regardless of the complexity and page count in a PDF document there'll never be more than twenty-two "DocStats" messages sent; see `41ac3f0c07/src/shared/util.js (L206-L232)` [2] One example is the `pdf.pdf` document in the test-suite, where rendering all of its 1310 pages only result in a total of seven "DocStats" messages being sent from the worker-thread. [3] Reporting telemetry, in Firefox, includes using `JSON.stringify` on the data and then sending an event to the `PdfStreamConverter.jsm`-code. In that code the event is handled and `JSON.parse` is used to retrieve the data, and in the "documentStats"-case we'll then iterate through the data to avoid double-reporting telemetry; see https://searchfox.org/mozilla-central/rev/8f4c180b87e52f3345ef8a3432d6e54bd1eb18dc/toolkit/components/pdfjs/content/PdfStreamConverter.jsm#515-549	2021-11-20 12:20:55 +01:00
Brendan Dahl	c6cb39ef30	Merge pull request #14262 from Snuffleupagus/issue-14261 Include the /Lang-property, when it exists, in the StructTree-data (issue 14261)	2021-11-19 07:51:21 -08:00
Brendan Dahl	9f4a2cf5ce	Merge pull request #14276 from Snuffleupagus/issue-14242-2 Only show the `loadingIcon`-spinner on visible pages (issue 14242)	2021-11-18 13:43:58 -08:00
Brendan Dahl	3209c013c4	Merge pull request #14247 from calixteman/button [api-minor] Render pushbuttons on their own canvas (bug 1737260)	2021-11-16 08:10:40 -08:00
Jonas Jenwald	7d4c37e988	Use the new iterator in the `PDFPageViewBuffer` unit-tests The previous patch introduced an iterator in the `PDFPageViewBuffer`-class, hence the test-only `_buffer`-getter is no longer necessary.	2021-11-15 14:06:17 +01:00
Jonas Jenwald	971ac8e993	Include the /Lang-property, when it exists, in the StructTree-data (issue 14261) Please note: This is a tentative patch, since I don't have the necessary a11y-software to actually test it.	2021-11-14 12:37:41 +01:00
calixteman	85c6dd59ce	Merge pull request #14268 from calixteman/outline Remove non-displayable chars from outline title (#14267)	2021-11-13 08:12:56 -08:00
Calixte Denizet	7041c62ccf	Remove non-displayable chars from outline title (#14267 ) - it aims to fix #14267; - there is nothing about chars in range [0-1F] in the specs but acrobat doesn't display them in any way.	2021-11-13 16:56:08 +01:00
Calixte Denizet	a88ff34eb7	Don't consider space as real space when there is an extra spacing (bug 931481) - it aims to fix https://bugzilla.mozilla.org/show_bug.cgi?id=931481; - real space chars are pushed in the chunk but when there is an extra spacing, the next char position must be compared with the previous one; - for example, an extra spacing can cancel a space so visually there are no space.	2021-11-12 18:53:48 +01:00
Calixte Denizet	33ea817b20	[api-minor] Render pushbuttons on their own canvas (bug 1737260) - First step to fix https://bugzilla.mozilla.org/show_bug.cgi?id=1737260; - several interactive pdfs use the possibility to hide/show buttons to show different icons; - render pushbuttons on their own canvas and then insert it the annotation_layer; - update test/driver.js in order to convert canvases for pushbuttons into images.	2021-11-12 15:37:33 +01:00
Jonas Jenwald	ea1c348c67	Always prefer abbreviated keys, over full ones, when doing any dictionary lookups (issue 14256) Note that issue 14256 was specifically about inline images, please refer to: - https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf#G7.1852045 - https://www.pdfa.org/safedocs-unearths-pdf-inline-image-issue/ - https://pdf-issues.pdfa.org/32000-2-2020/clause08.html#H8.9.7 However, during review of the initial PR in https://github.com/mozilla/pdf.js/pull/14257#issuecomment-964469710, it was suggested that we instead do this unconditionally for all dictionary lookups. In addition to re-ordering the existing call-sites in the `src/core`-code, and adding non-PRODUCTION/TESTING asserts to catch future errors, for consistency a number of existing `if`/`switch`-blocks were re-factored to also check the abbreviated keys first.	2021-11-10 11:56:18 +01:00
Calixte Denizet	13ae6d493a	XFA - Encode tag names in UTF-8 when saving (fix #14249 )	2021-11-07 21:41:37 +01:00
Tim van der Meij	891f21fba6	Merge pull request #14245 from Snuffleupagus/PDFPageViewBuffer-class Convert `PDFPageViewBuffer` to a standard class, and use a `Set` internally	2021-11-07 14:37:33 +01:00
Calixte Denizet	1681e25008	XFA - Get each page asynchronously in order to avoid blocking the event loop (#14014 )	2021-11-06 13:25:03 +01:00
Jonas Jenwald	a774707e31	Remove the `moveToEndOfArray` helper function, since it's unused With the previous patch, this helper function is no longer used and keeping it around will simply increase the size of the builds. This removal is purposely done separately, to make it easy to revert the patch in the future if this helper function would become useful again.	2021-11-06 10:19:17 +01:00
Jonas Jenwald	c62bcb55ac	Adjust the "handles `resize` correctly, with `idsToKeep` provided" unit-test (PR 14238 follow-up) This small change will help validate an important part of the upcoming re-factoring, regarding the correct iteration of the `Set` in the `PDFPageViewBuffer.resize` method in particular.	2021-11-06 10:19:17 +01:00
Jonas Jenwald	fe205efd8d	Add a couple of basic unit-tests for `PDFPageViewBuffer` The `PDFPageViewBuffer`-code is very important for the correct function of the viewer, but it's currently not tested at all. While the `PDFPageViewBuffer` is obviously intended to be used with `PDFPageView`-instances, it only accesses a couple of `PDFPageView` properties/methods and consequently it's fairly easy to unit-test this code with dummy-data. These unit-tests should help improve our confidence in this code, and will also come in handy with other changes that I'm working on (regarding modernizing and re-factoring the `PDFPageViewBuffer`-code).	2021-11-05 19:43:20 +01:00
Jonas Jenwald	611627f5a1	Merge pull request #14219 from Snuffleupagus/getVisibleElements-ids Let `getVisibleElements` return a Set containing the visible element `id`s	2021-11-03 23:49:27 +01:00
Jonas Jenwald	6323f8532a	Let `getVisibleElements` return a Set containing the visible element `id`s Note how in `PDFPageViewBuffer.resize` we're manually iterating through the visible pages in order to build a Set of the visible page `id`s. By instead moving the building of this Set into the `getVisibleElements` helper function, as part of the existing parsing, this code becomes ever so slightly more efficient. Furthermore, more direct access to the visible page `id`s also come in handy in other parts of the viewer as well. In the `BaseViewer.isPageVisible` method we no longer need to loop through the visible pages, but can instead directly check if the pageNumber is visible. In the `PDFRenderingQueue.getHighestPriority` method, when checking for "holes" in the page layout, we can also avoid some unnecessary look-ups this way.	2021-11-03 21:13:44 +01:00
Jonas Jenwald	5f77d3719b	Tweak the Bidi-detection heuristics for very short RTL strings (issue 11656) Very short strings can narrowly miss the existing Bidi-detection threshold, leading to incorrect text-selection and copying behaviour. In my testing, neither Adobe Reader or PDFium seem to handle copying "correctly" for this document. Hence it's not entirely clear to me that we actually want to fix this, since tweaking these heuristics can obviously cause regressions elsewhere (and our test coverage for RTL-text isn't exactly great).	2021-11-03 20:31:57 +01:00
Calixte Denizet	cf8dc750d6	Support rich content in markup annotation - use the xfa parser but in the xhtml namespace.	2021-10-31 13:44:51 +01:00
Tim van der Meij	0e7614df7f	Merge pull request #14180 from Snuffleupagus/bug-1627427 Handle ranges that "overflow" the last byte in `CMap.mapBfRange` (bug 1627427)	2021-10-27 20:06:09 +02:00
Jane-Kotovich	91fc643ff9	[api-minor] Implement securityHandler in the scripting API (bug 1731578)	2021-10-26 23:42:04 +10:00
Jonas Jenwald	aa1b78684f	Handle ranges that "overflow" the last byte in `CMap.mapBfRange` (bug 1627427)	2021-10-24 13:48:38 +02:00
Brendan Dahl	b66239d6dc	Merge pull request #14114 from Snuffleupagus/issue-14110 [api-minor] Include the /Lang-property in the `documentInfo`, and use it in the viewer (issue 14110)	2021-10-19 08:08:08 -07:00
calixteman	bbb64369f1	Merge pull request #13424 from calixteman/chunks2 [api-minor] Fix issues in text selection	2021-10-18 06:14:15 -07:00
Calixte Denizet	61d1063276	Fix issues in text selection - PR #13257 fixed a lot of issues but not all and this patch aims to fix almost all remaining issues. - the idea in this new patch is to compare position of new glyph with the last position where a glyph has been drawn; - no space are "drawn": it just moves the cursor but they aren't added in the chunk; - so this way a space followed by a cursor move can be treated as only one space: it helps to merge all spaces into one. - to make difference between real spaces and tracking ones, we used a factor of the space width (from the font) - it was a pretty good idea in general but it fails with some fonts where space was too big: - in Poppler, they're using a factor of the font size: this is an excellent idea (<= 0.1 * fontSize implies tracking space).	2021-10-17 16:27:05 +02:00
Jonas Jenwald	00720d059a	[api-minor] Include the /Lang-property in the `documentInfo`, and use it in the viewer (issue 14110) Please note: This is a tentative patch, since I don't have the necessary a11y-software to actually test it. To avoid having to add a new API-method just for a single string, I figured that adding the new property to the existing `documentInfo`-data (accessed via `PDFDocumentProxy.getMetadata` in the API) will hopefully be deemed acceptable.	2021-10-16 14:27:47 +02:00
Tim van der Meij	52fce0d17b	Merge pull request #14152 from Snuffleupagus/xfaFactory-typo Fix a `xfaFaxtory` typo in the shadowing in the `PDFDocument.xfaFactory` getter, and some other clean-up	2021-10-16 14:23:47 +02:00
Jonas Jenwald	0041230072	Re-name the `XFAFactory.numberPages` getter to `XFAFactory.numPages` for consistency All other similar getters are called `numPages` throughout the code-base, and improved consistency should always be a good thing.	2021-10-16 12:56:21 +02:00
Jonas Jenwald	fa8c0ef616	[api-minor] Change `PDFFindController` to use the "find"-event directly (issue 12731) Looking at the code, I do have to agree with the point made in issue 12731 about it being unexpected/unhelpful that the `PDFFindController.executeCommand`-method isn't directly usable with the "find"-event. The reason for it being this way is, as so often, for historical reasons: The `executeCommand`-method was added (just) prior to the introduction of the `EventBus` in the viewer. Obviously we cannot simply change the existing `PDFFindController.executeCommand`-method, since that'd be a breaking change in code which has existed for over five years. Initially I figured that we could simply add a new method in `PDFFindController` that'd accept the state from the "find"-event, however after thinking about this and looking through the use-cases in the default viewer I settled on a slightly different approach: Let the `PDFFindController` just listen for the "find"-event (on the `EventBus`-instance) directly instead, which also removes one level of (unneeded) indirection during searching in the default viewer. For GENERIC builds of the PDF.js library, the old `PDFFindController.executeCommand`-method is still available with a deprecation warning.	2021-10-16 10:36:22 +02:00
Jonas Jenwald	bb9c905c5d	Ensure that various URL-related options are applied in the `xfaLayer` too Note how both the annotationLayer and the document outline will apply various URL-related options when creating the link-elements. For consistency the `xfaLayer`-rendering should obviously use the same options, to ensure that the existing options are indeed applied to all URLs regardless of where they originate.	2021-10-02 09:32:23 +02:00
Jonas Jenwald	e6e04694f4	[api-minor] Move the `addDefaultProtocolToUrl`/`tryConvertUrlEncoding` functionality into the `createValidAbsoluteUrl` function Having recently worked with, and reviewed patches touching, this code it seemed that it's probably not a bad idea to move that functionality into `createValidAbsoluteUrl` as new options instead. For the `addDefaultProtocolToUrl` functionality in particular, the existing helper function was not only moved but slightly improved as well. Looking at the code, I realized that there's a small risk that it would incorrectly match a relative URL-string too. With these changes, the `createValidAbsoluteUrl` call-sites in the `src/core/`-code can be simplified a little bit. Please note: This patch may, indirectly, change the format of the `unsafeUrl`-property returned with relevant Annotations and OutlineItems; hence the `api-minor` tag. However, I'd argue that it's actually more correct this way since the whole purpose of `unsafeUrl` is/was to return the URL data as-is without any parsing done.	2021-09-26 14:29:54 +02:00
Calixte Denizet	558e58f354	XFA - Add <a> element in button when an url is detected (bug 1716758) - it aims to fix https://bugzilla.mozilla.org/show_bug.cgi?id=1716758; - some buttons have a JS action with the pattern `app.launchURL(...)` (or similar) so extract when it's possible the url and generate a <a> element with the href equals to the found url; - pdf.js already had some code to handle that so this patch slightly refactor that.	2021-09-25 21:59:39 +02:00
Jonas Jenwald	1dcd2f0cd3	[api-minor] Add basic support for RTL text-content in PopupAnnotations (issue 14046) In order to implement this, we utilize the existing `bidi` function to infer the text-direction of /T and /Contents entries. While this may not be perfect in cases where one PopupAnnotation mixes LTR and RTL languages, it should work well enough in most cases. To avoid having to add two new properties in lots of annotations, supplementing the existing `title`/`contents`-properties, this patch instead re-factors the existing code such that the properties are replaced by Objects (containing `str` and `dir`). Please note: In order avoid breaking existing third-party implementations, `GENERIC`-builds of the PDF.js library will still provide the old `title`/`contents`-properties on annotations returned by `PDFPageProxy.getAnnotations`.	2021-09-25 09:18:58 +02:00
Calixte Denizet	97c1e076a1	XFA - Bind items when there's a bindItems entry - In the pdf in issue #14071, some select fields don't contain any values; - the corresponding node has a bindItems and a bind elements and _bindItems function was just not called.	2021-09-24 16:08:58 +02:00
Calixte Denizet	4b0538d07a	Don't save anything in XFA entry if no XFA! (bug 1732344) - it aims to fix https://bugzilla.mozilla.org/show_bug.cgi?id=1732344 - rename some variables to have a more clear code; - and last but no least, add a unit test to test saving.	2021-09-23 19:51:23 +02:00
Jonas Jenwald	81a1c1cef7	Correctly validate URLs in XFA documents (bug 1731240) With this patch we'll ensure that only valid absolute URLs can be used in XFA documents, similar to the existing validation done for "regular" PDF documents. Furthermore, we'll also attempt to add a default protocol (i.e. `http`) to URLs beginning with "www." in XFA documents as well; this on its own is enough to fix https://bugzilla.mozilla.org/show_bug.cgi?id=1731240	2021-09-21 21:21:01 +02:00
Jonas Jenwald	20eb6ca2ec	Merge pull request #14044 from calixteman/bug1719148 Annotations - Avoid empty value in text field when storage contains something for it (bug 1719148)	2021-09-18 16:31:45 +02:00
Calixte Denizet	eb762ad624	Annotations - Avoid empty value in text field when storage contains something for it (bug 1719148) - it aims to fix https://bugzilla.mozilla.org/show_bug.cgi?id=1719148; - JS can set a property for a non-rendered annotation using the annotationStorage but the other unset default properties must be used when the annotation is finally rendered; - so this patch just adds the properties already set in the annotationStorage to the default value.	2021-09-18 15:08:22 +02:00
Jonas Jenwald	d854352cd5	Improve the API unit-tests by checking that `PDFPageProxy.render` returns a `RenderTask`-instance This is similar to existing unit-tests, which checks for `PDFDocumentProxy`- and `PDFPageProxy`-instances.	2021-09-13 13:34:37 +02:00
Jonas Jenwald	fa7a607d33	Improve the API unit-tests by checking that `getDocument` returns a `PDFDocumentLoadingTask`-instance This is similar to existing unit-tests, which checks for `PDFDocumentProxy`- and `PDFPageProxy`-instances.	2021-09-13 13:34:28 +02:00
Jonas Jenwald	7025b9f859	[src/core/writer.js] Support `null` values in the `writeValue` function This fixes something that I noticed, having recently looked at both the `Lexer.getObj` and `writeValue` code. Please note that I unfortunately don't have an example of a form where saving fails without this patch. However, given its overall simplicity and that unit-tests are added, it's hopefully deemed useful to fix this potential issue pro-actively rather than waiting for a bug report. At this point one might, and rightly so, wonder if there's actually any real-world PDF documents where a `null` value is being used? Unfortunately the answer is yes, and we have a couple of examples in the test-suite (although none of those are related to forms); please see: `issue1015`, `issue2642`, `issue10402`, `issue12823`, `issue13823`, and `pr12564`.	2021-09-12 18:24:37 +02:00
Jonas Jenwald	761519ef3f	Merge pull request #13998 from calixteman/bug1729971 Write boolean value when saving a form (bug 1729971)	2021-09-12 15:38:10 +02:00
Jonas Jenwald	a47844d1fc	Let `Lexer.getObj` return a dummy-`Cmd` for commands that start with a non-visible ASCII character (issue 13999) This way we avoid breaking badly generated PDF documents where a non-visible ASCII character is "glued" to a valid command.	2021-09-11 19:54:13 +02:00
Calixte Denizet	474ab7c86d	Write boolean value when saving a form (bug 1729971) - it aims to fix https://bugzilla.mozilla.org/show_bug.cgi?id=1729971#c4.	2021-09-10 14:10:25 +02:00
Tim van der Meij	680f33c31c	Merge pull request #13961 from Snuffleupagus/simpler-regexp Simplify some regular expressions	2021-09-04 15:39:30 +02:00

1 2 3 4 5 ...

895 Commits