pdf.js

Author	SHA1	Message	Date
Jonas Jenwald	a807ffe907	Prevent circular references in XRef tables from hanging the worker-thread (issue 14303) Please note: While this patch on its own is sufficient to prevent the worker-thread from hanging, however in combination with PR 14311 these PDF documents will both load and render correctly. Rather than focusing on the particular structure of these PDF documents, it seemed (at least to me) to make sense to try and prevent all circular references when fetching/looking-up data using the XRef table. To avoid a solution that required tracking the references manually everywhere, the implementation settled on here instead handles that internally in the `XRef.fetch`-method. This should work, since that method and the `Parser`/`Lexer`-implementations are completely synchronous. Note also that the existing `XRef`-caching, used for all data-types except Streams, should hopefully help to lessen the performance impact of these changes. One potential problem with these changes could be certain browser exceptions, since those are generally not catchable in JavaScript code, however those would most likely "stop" worker-thread parsing anyway (at least I hope so). Finally, note that I settled on returning dummy-data rather than throwing an exception. This was done to allow parsing, for the rest of the document, to continue such that one bad reference doesn't prevent an entire document from loading. Fixes two of the issues listed in issue 14303, namely the `poppler-91414-0.zip-2.gz-53.pdf` and `poppler-91414-0.zip-2.gz-54.pdf` documents.	2021-11-27 23:50:26 +01:00
Jonas Jenwald	a669fce762	Inline the `isDict`, `isRef`, and `isStream` checks in the `src/core/xref.js` file	2021-11-27 23:49:17 +01:00
Jonas Jenwald	680e0efb9d	Use Array-destructuring in the `XRef.readXRefStream`-method	2021-11-27 23:49:17 +01:00
Jonas Jenwald	a2a5376adf	Merge pull request #14311 from Snuffleupagus/validate-Pages-Count [api-minor] Validate the /Pages-tree /Count entry during document initialization (issue 14303)	2021-11-27 23:47:05 +01:00
Jonas Jenwald	d0c4bbd828	[api-minor] Validate the /Pages-tree /Count entry during document initialization (issue 14303) This patch basically extends the approach from PR 10392, by also checking the last page. Currently, in e.g. the `Catalog.numPages`-getter, we're simply assuming that if the /Pages-tree has an integer /Count entry it must also be correct/valid. As can be seen in the referenced PDF documents, that entry may be completely bogus which causes general parsing to breaking down elsewhere in the worker-thread (and hanging the browser). Rather than hoping that the /Count entry is correct, similar to all other data found in PDF documents, we obviously need to validate it. This turns out to be a little less straightforward than one would like, since the only way to do this (as far as I know) is to parse the entire /Pages-tree and essentially counting the pages. To avoid doing that for all documents, this patch tries to take a short-cut by checking if the last page (based on the /Count entry) can be successfully fetched. If so, we assume that the /Count entry is correct and use it as-is, otherwise we'll iterate through (potentially) the entire /Pages-tree to determine the number of pages. Unfortunately these changes will have a number of somewhat negative side-effects, please see a possibly incomplete list below, however I cannot see a better way to address this bug. - This will slow down initial loading/rendering of all documents, at least by some amount, since we now need to fetch/parse more of the /Pages-tree in order to be able to access the last page of the PDF documents. - For poorly generated PDF documents, where the entire /Pages-tree only has one level, we'll unfortunately need to fetch/parse the entire /Pages-tree to get to the last page. While there's a cache to help reduce repeated data lookups, this will affect initial loading/rendering of some long PDF documents, - This will affect the `disableAutoFetch = true` mode negatively, since we now need to fetch/parse more data during document initialization. While the `disableAutoFetch = true` mode should still be helpful in larger/longer PDF documents, for smaller ones the effect/usefulness may unfortunately be lost. As one small additional bonus, we should now also be able to support opening PDF documents where the /Pages-tree /Count entry is completely invalid (e.g. contains a non-integer value). Fixes two of the issues listed in issue 14303, namely the `poppler-67295-0.pdf` and `poppler-85140-0.pdf` documents.	2021-11-27 21:57:35 +01:00
Tim van der Meij	9a1e27efc5	Merge pull request #14313 from Snuffleupagus/PDFDocument_pagePromises-map Change the `_pagePromises` cache, in the worker, from an Array to a Map	2021-11-27 20:58:23 +01:00
calixteman	bbd8b5ce9f	Merge pull request #14319 from calixteman/xfa_arc XFA - Draw arcs correctly	2021-11-27 11:32:32 -08:00
Calixte Denizet	31e13515f5	XFA - Draw arcs correctly - it aims to fix #14315; - take into account the startAngle to compute the coordinates of the final point.	2021-11-27 19:30:12 +01:00
Jonas Jenwald	b11091c0f9	Merge pull request #14318 from calixteman/14317 Handle sub/super-scripts in rich text	2021-11-27 18:49:52 +01:00
Calixte Denizet	cfdaa57353	Handle sub/super-scripts in rich text - it aims to fix #14317; - change the fontSize and the verticalAlign properties according to the position of the text.	2021-11-27 16:06:09 +01:00
Jonas Jenwald	e439f4d620	Merge pull request #14314 from Snuffleupagus/XFA-viewer-refs-cache-fix [Regression] Prevent errors, during loading, in the viewer for XFA-documents (PR 14295 follow-up)	2021-11-26 20:47:45 +01:00
Jonas Jenwald	8fa5fcfe72	[Regression] Prevent errors, during loading, in the viewer for XFA-documents (PR 14295 follow-up) In the second commit in PR 14295, I forgot that the pages in XFA-documents don't have references (like in regular PDF documents); sorry about that!	2021-11-26 20:21:12 +01:00
Jonas Jenwald	4c56214ab4	Convert `PDFDocument._getLinearizationPage` to an async method This, ever so slightly, simplifies the code and reduces overall indentation.	2021-11-26 19:57:47 +01:00
Jonas Jenwald	080996ac68	Change the `_pagePromises` cache, in the worker, from an Array to a Map Given that not all pages necessarily are being accessed, or that the pages may be accessed out of order, using a `Map` seems like a more appropriate data-structure here. Furthermore, this patch also adds (currently missing) caching for XFA-documents. Loading a couple of such documents in the viewer, with logging added, shows that we're currently re-creating `Page`-instances unnecessarily for XFA-documents.	2021-11-26 19:53:57 +01:00
Jonas Jenwald	2e2d049a9c	Merge pull request #14310 from Snuffleupagus/XRef-bogus-byteWidths Abort parsing when the XRef /W-array contain bogus entries (issue 14303)	2021-11-25 19:57:40 +01:00
Jonas Jenwald	ca8d2bdce4	Abort parsing when the XRef /W-array contain bogus entries (issue 14303) For this particular PDF document, we have `/W [1 2 166666666666666666666666666]` which obviously makes no sense. While this patch makes no attempt at actually validating the entries in the /W-array, we'll now simply abort all processing when the end of the PDF document has been reached (thus preventing hanging the browser). Please note that this patch doesn't enable the PDF document to be loaded/rendered, but at least it fails "correctly" now. Fixes one of the issues listed in issue 14303, namely the `REDHAT-1531897-0.pdf`document.	2021-11-25 18:35:08 +01:00
Tim van der Meij	a2c380ccb3	Merge pull request #14298 from Snuffleupagus/issue-10906 Center pages vertically in PresentationMode (issue 10906)	2021-11-24 21:39:28 +01:00
Tim van der Meij	973932321e	Merge pull request #14304 from Snuffleupagus/huge-XRef-entry Ensure that `ChunkedStream` won't attempt to request data beyond the document size (issue 14303)	2021-11-24 21:28:24 +01:00
Jonas Jenwald	ae4f1ae3e7	Ensure that `ChunkedStream` won't attempt to request data beyond the document size (issue 14303) This bug was surprisingly difficult to track down, since it didn't just depend on range-requests being used but also on how quickly the document was loaded. To even be able to reproduce this locally, I had to use a very small `rangeChunkSize`-value (note the unit-test). The cause of this bug is a bogus entry in the XRef-table, causing us to attempt to request data from beyond the actual document size and thus getting into an infinite loop. Fixes one of the issues listed in issue 14303, namely the `PDFBOX-4352-0.pdf` document.	2021-11-24 19:19:43 +01:00
Jonas Jenwald	5e2aec7dd7	Merge pull request #14299 from SaiKiranMukka/pageviewer-to-async Convert examples/components/pageviewer.js to await/async (issue 14127)	2021-11-24 14:52:31 +01:00
Jonas Jenwald	f7b1da418f	Center pages vertically in PresentationMode (issue 10906) This patch can be tested e.g. with the `sizes.pdf` document in the test-suite. While this patch isn't necessarily the best solution, e.g. it might be possible to solve this with only CSS, it's what I was able to come up with to address an old issue. The solution here re-uses the `spread`-class in PresentationMode, since that one already takes care of centering pages vertically, together with a dummy-page that takes up the entire height of the window. Finally, some PresentationMode-related CSS-rules are also simplified slightly, since the changes in PR 14112 (using Page-scrolling) allows some clean-up here.	2021-11-24 14:09:34 +01:00
Sai Kiran Mukka	711fbe1376	Convert examples/components/pageviewer.js to await/async (issue 14127)	2021-11-24 15:22:21 +05:30
Tim van der Meij	70fc30d97c	Merge pull request #14295 from Snuffleupagus/rm-viewer-_pagesRequests Remove the `{BaseViewer, PDFThumbnailViewer}._pagesRequests` caches	2021-11-21 14:30:37 +01:00
Jonas Jenwald	58a2728647	Ensure that `BaseViewer.#ensurePdfPageLoaded` updates the `PDFLinkService`-pagesRefCache if necessary The issue that this patch fixes has existed ever since the viewer was first re-factored into components, however it only really affects the `disableAutoFetch = true` mode. By default we're fetching all pages in `BaseViewer.setDocument`, and as part of the parsing/initialization we're also populating the `PDFLinkService`-pagesRefCache. The purpose of that cache is to make navigating to any internal destinations faster, by not having to (asynchronously) lookup the pageNumber via the API when handling the destination. In comparison, when the `disableAutoFetch = true` mode is being used we're instead lazily initializing the pages in the `BaseViewer.#ensurePdfPageLoaded`-method. For some reason, that I can only assume is a simple oversight, we're not attempting to update the `PDFLinkService`-pagesRefCache in that case.	2021-11-21 11:53:19 +01:00
Jonas Jenwald	0ebac67a9f	Remove the `{BaseViewer, PDFThumbnailViewer}._pagesRequests` caches In the `BaseViewer` this cache is mostly relevant in the `disableAutoFetch = true` mode, since the pages are being initialized lazily in that case. In the `PDFThumbnailViewer` this cache is mostly used for thumbnails that are actually being rendered, as opposed to those created directly from the "regular" pages. Please note that I'm not suggesting that we remove these caches because they're only used in some situations, but rather because they're for all intents and purposes actually redundant. In the API itself, we're already caching both the page-promises and the actual pages themselves on the `WorkerTransport`-instance. Hence these viewer-caches aren't really necessary in practice, and adds what to me mostly seems like an unnecessary level of indirection.[1] Given that the viewer now relies on caching in the API itself, this patch also adds a new unit-test to ensure that page-caching works (and keep working) as expected. --- [1] In the `WorkerTransport.getPage`-method the parameter is being validated on every call, but that's hardly enough code to warrant keeping the "duplicate" caches in the viewer in my opinion.	2021-11-21 11:40:45 +01:00
Tim van der Meij	aabd4e5092	Merge pull request #14294 from Snuffleupagus/getStats-refactor [api-minor] Replace `PDFDocumentProxy.getStats` with a synchronous `PDFDocumentProxy.stats` getter	2021-11-20 15:42:46 +01:00
Jonas Jenwald	6da0944fc7	[api-minor] Replace `PDFDocumentProxy.getStats` with a synchronous `PDFDocumentProxy.stats` getter Please note: These changes will primarily benefit longer documents, somewhat at the expense of e.g. one-page documents. The existing `PDFDocumentProxy.getStats` function, which in the default viewer is called for each rendered page, requires a round-trip to the worker-thread in order to obtain the current document stats. In the default viewer, we currently make one such API-call for every rendered page. This patch proposes replacing that method with a synchronous `PDFDocumentProxy.stats` getter instead, combined with re-factoring the worker-thread code by adding a `DocStats`-class to track Stream/Font-types and only send them to the main-thread the first time that a type is encountered. Note that in practice most PDF documents only use a fairly limited number of Stream/Font-types, which means that in longer documents most of the `PDFDocumentProxy.getStats`-calls will return the same data.[1] This re-factoring will obviously benefit longer document the most[2], and could actually be seen as a regression for one-page documents, since in practice there'll usually be a couple of "DocStats" messages sent during the parsing of the first page. However, if the user zooms/rotates the document (which causes re-rendering), note that even a one-page document would start to benefit from these changes. Another benefit of having the data available/cached in the API is that unless the document stats change during parsing, repeated `PDFDocumentProxy.stats`-calls will return the same identical object. This is something that we can easily take advantage of in the default viewer, by now only reporting "documentStats" telemetry[3] when the data actually have changed rather than once per rendered page (again beneficial in longer documents). --- [1] Furthermore, the maximium number of `StreamType`/`FontType` are `10` respectively `12`, which means that regardless of the complexity and page count in a PDF document there'll never be more than twenty-two "DocStats" messages sent; see `41ac3f0c07/src/shared/util.js (L206-L232)` [2] One example is the `pdf.pdf` document in the test-suite, where rendering all of its 1310 pages only result in a total of seven "DocStats" messages being sent from the worker-thread. [3] Reporting telemetry, in Firefox, includes using `JSON.stringify` on the data and then sending an event to the `PdfStreamConverter.jsm`-code. In that code the event is handled and `JSON.parse` is used to retrieve the data, and in the "documentStats"-case we'll then iterate through the data to avoid double-reporting telemetry; see https://searchfox.org/mozilla-central/rev/8f4c180b87e52f3345ef8a3432d6e54bd1eb18dc/toolkit/components/pdfjs/content/PdfStreamConverter.jsm#515-549	2021-11-20 12:20:55 +01:00
Tim van der Meij	41ac3f0c07	Merge pull request #14291 from Snuffleupagus/force-postMessageTransfers [api-minor] Only use Workers when `postMessage` transfers are supported (PR 11123 follow-up)	2021-11-19 20:02:51 +01:00
Tim van der Meij	b1e9e214bf	Merge pull request #14229 from brendandahl/term-log Add an easy way to log to the terminal during browser tests.	2021-11-19 19:48:59 +01:00
Brendan Dahl	c6cb39ef30	Merge pull request #14262 from Snuffleupagus/issue-14261 Include the /Lang-property, when it exists, in the StructTree-data (issue 14261)	2021-11-19 07:51:21 -08:00
Jonas Jenwald	6f22327e61	[api-minor] Only use Workers when `postMessage` transfers are supported (PR 11123 follow-up) Given that all modern browsers now support `postMessage` transfers, and have for years, it no longer seems necessary for the PDF.js library to support using Workers unless the `postMessage` transfers functionality is available. This patch is a follow-up to PR 11123, which made it impossible to manually disable `postMessage` transfers for performance reasons (since it increases memory usage), which hasn't caused any bug reports as far as I know.[1] Hence we'll now only support proper Worker implementations, with fully working `postMessage` transfers, and fallback to using "fake" Workers otherwise. --- [1] At the time of that PR we still "supported" IE, which is why this code was left intact.	2021-11-19 16:47:58 +01:00
Brendan Dahl	052db56a2e	Add an easy way to log to the terminal during browser tests. On the main thread call `driver.log` and the message will output in the terminal with the pdf id and the message. I've been using this a lot when trying to find certain PDFs or logging stats.	2021-11-18 15:38:56 -08:00
Brendan Dahl	9f4a2cf5ce	Merge pull request #14276 from Snuffleupagus/issue-14242-2 Only show the `loadingIcon`-spinner on visible pages (issue 14242)	2021-11-18 13:43:58 -08:00
Tim van der Meij	3dccaccbb4	Merge pull request #14278 from Snuffleupagus/rm-removeChild Replace the remaining `Node.removeChild()` instances with `Element.remove()`	2021-11-17 20:17:55 +01:00
Tim van der Meij	f90eebd282	Merge pull request #14280 from Snuffleupagus/scrollMode-PAGE-spread-loop Slightly optimize `spreadMode` toggling with `ScrollMode.PAGE` set (PR 14112 follow-up)	2021-11-17 19:46:30 +01:00
Jonas Jenwald	4e2c2fafc9	Enable the `unicorn/prefer-dom-node-remove` ESLint plugin rule Please see https://github.com/sindresorhus/eslint-plugin-unicorn/blob/main/docs/rules/prefer-dom-node-remove.md	2021-11-16 17:52:50 +01:00
Jonas Jenwald	4ef1a129fa	Replace the remaining `Node.removeChild()` instances with `Element.remove()` Using `Element.remove()` is a slightly more compact way of removing an element, since you no longer need to explicitly find/use its parent element. Furthermore, the patch also replaces a couple of loops that're used to delete all elements under a node with simply overwriting the contents directly (a pattern already used throughout the viewer). See also: - https://developer.mozilla.org/en-US/docs/Web/API/Node/removeChild - https://developer.mozilla.org/en-US/docs/Web/API/Element/remove	2021-11-16 17:52:50 +01:00
Brendan Dahl	3209c013c4	Merge pull request #14247 from calixteman/button [api-minor] Render pushbuttons on their own canvas (bug 1737260)	2021-11-16 08:10:40 -08:00
Jonas Jenwald	1214c056e9	Slightly optimize `spreadMode` toggling with `ScrollMode.PAGE` set (PR 14112 follow-up) It shouldn't be necessary to iterate through all pages when using a non-default `spreadMode`, since we already know which page(s) should become visible. This code is a left-over from the initial (local) implementation that resulted in PR 14112, however I forgot to clean-up some things such as e.g. this loop. Also fixes an outdated comment, see PR 14204 which removed the mentioned data-structure.	2021-11-16 15:37:58 +01:00
Jonas Jenwald	7d4c37e988	Use the new iterator in the `PDFPageViewBuffer` unit-tests The previous patch introduced an iterator in the `PDFPageViewBuffer`-class, hence the test-only `_buffer`-getter is no longer necessary.	2021-11-15 14:06:17 +01:00
Jonas Jenwald	e909fcdba8	Only show the `loadingIcon`-spinner on visible pages (issue 14242) This patch preserves the old behaviour of appending a `loadingIcon`-div to all pages that are not yet loaded/rendered. However, the actual `loadingIcon`-spinner (i.e. the `loading-icon.gif` image) will only be displayed on visible pages to improve performance. To avoid having to iterate through all pages in the document, which doesn't seem like a good idea for a PDF document with thousands of pages, we use a combination of the currently visible and cached pages to toggle the `loadingIcon`-spinner.	2021-11-15 14:06:14 +01:00
Tim van der Meij	e4f97a2a91	Merge pull request #14273 from Snuffleupagus/update-packages Update packages and translations	2021-11-14 15:09:31 +01:00
Jonas Jenwald	971ac8e993	Include the /Lang-property, when it exists, in the StructTree-data (issue 14261) Please note: This is a tentative patch, since I don't have the necessary a11y-software to actually test it.	2021-11-14 12:37:41 +01:00
Jonas Jenwald	a54bed4963	Enable the ESLint `no-loss-of-precision` rule Please refer to https://eslint.org/docs/rules/no-loss-of-precision	2021-11-14 10:48:50 +01:00
Jonas Jenwald	c47f5e81fe	Update l10n files	2021-11-14 10:48:50 +01:00
Jonas Jenwald	04bdc26d3a	Update the `eslint-plugin-unicorn` package to the latest version	2021-11-14 10:27:29 +01:00
Jonas Jenwald	1dd74efb0f	Update the `eslint-plugin-no-unsanitized` package to the latest version	2021-11-14 10:24:41 +01:00
Jonas Jenwald	bd1e140e2a	Update the `dommatrix` package to the latest version	2021-11-14 10:20:54 +01:00
Jonas Jenwald	9f6d37263c	Update npm packages	2021-11-14 10:17:30 +01:00
Jonas Jenwald	712621b508	Merge pull request #14255 from Snuffleupagus/GrabToPan-class Convert `GrabToPan` to a standard `class`	2021-11-13 23:24:36 +01:00

1 2 3 4 5 ...

15073 Commits