pdf.js

Author	SHA1	Message	Date
Tim van der Meij	95f9075565	Optimize `TextLayerRenderTask._layoutText` to avoid intermediate string creation This method creates quite a few intermediate strings on each call and it's called often, even for smaller documents like the Tracemonkey document. Scrolling from top to bottom in that document resulted in 12936 strings being created in this method. With this commit applied, this is reduced to 3610 strings.	2018-12-30 14:39:08 +01:00
Tim van der Meij	d5e5d18430	Convert the `PDFDocument` class in `src/core/document.js` to ES6 syntax	2018-12-30 13:54:43 +01:00
Tim van der Meij	612fc9fcc2	Convert the `Page` class in `src/core/document.js` to ES6 syntax	2018-12-30 13:54:43 +01:00
Tim van der Meij	aad27ff9a0	Optimize the `Ref` class in `src/core/primitives.js` The `toString` method always creates two string objects (for the 'R' character and for the `num` concatenation) and in the worst case creates three string objects (one more for the `gen` concatenation). For the Tracemonkey paper alone, this resulted in 12000 string objects when scrolling from the top to the bottom of the document. Since this is a hot function, it's worth minimizing the number of string objects, especially for large documents, to reduce peak memory usage. This commit refactors the `toString` method to always create only one string object.	2018-12-29 17:48:41 +01:00
Jonas Jenwald	60bcce184e	Check that the first page can be successfully loaded, to try and ascertain the validity of the XRef table (issue 7496, issue 10326) For PDF documents with sufficiently broken XRef tables, it's usually quite obvious when you need to fallback to indexing the entire file. However, for certain kinds of corrupted PDF documents the XRef table will, for all intents and purposes, appear to be valid. It's not until you actually try to fetch various objects that things will start to break, which is the case in the referenced issues[1]. Since there's generally a real effort being in made PDF.js to load even corrupt PDF documents, this patch contains a suggested approach to attempt to do a bit more validation of the XRef table during the initial document loading phase. Here the choice is made to attempt to load the first page, as a basic sanity check of the validity of the XRef table. Please note that attempting to load a more-or-less arbitrarily chosen object without any context of what it's supposed to be isn't a very useful, which is why this particular choice was made. Obviously, just because the first page can be loaded successfully that doesn't guarantee that the entire XRef table is valid, however if even the first page fails to load you can be reasonably sure that the document is not valid[2]. Even though this patch won't cause any significant increase in the amount of parsing required during initial loading of the document[3], it will require loading of more data upfront which thus delays the initial `getDocument` call. Whether or not this is a problem depends very much on what you actually measure, please consider the following examples: ```javascript console.time('first'); getDocument(...).promise.then((pdfDocument) => { console.timeEnd('first'); }); console.time('second'); getDocument(...).promise.then((pdfDocument) => { pdfDocument.getPage(1).then((pdfPage) => { // Note: the API uses `pageNumber >= 1`, the Worker uses `pageIndex >= 0`. console.timeEnd('second'); }); }); ``` The first case is pretty much guaranteed to show a small regression, however the second case won't be affected at all since the Worker caches the result of `getPage` calls. Again, please remember that the second case is what matters for the standard PDF.js use-case which is why I'm hoping that this patch is deemed acceptable. --- [1] In issue 7496, the problem is that the document is edited without the XRef table being correctly updated. In issue 10326, the generator was sorting the XRef table according to the offsets rather than the objects. [2] The idea of checking the first page in particular came from the "standard" use-case for the PDF.js library, i.e. the default viewer, where a failure to load the first page basically means that nothing will work; note how `{BaseViewer, PDFThumbnailViewer}.setDocument` depends completely on being able to fetch the first page. [3] The only extra parsing is caused by, potentially, having to traverse part of the `Pages` tree to find the first page.	2018-12-29 12:47:25 +01:00
Tim van der Meij	360c3d3813	Remove the unused `url` argument for the `ChunkedStreamManager` class	2018-12-24 13:14:42 +01:00
Tim van der Meij	47344197f4	Convert `src/core/chunked_stream.js` to ES6 syntax	2018-12-24 13:14:42 +01:00
Tim van der Meij	103f4616ac	Merge pull request #10334 from Snuffleupagus/OpenAction-dest [api-minor] Add support for OpenAction destinations (issue 10332)	2018-12-23 20:49:50 +01:00
Jonas Jenwald	f0719ed565	[api-minor] Change the `getViewport` method, on `PDFPageProxy`, to take a parameter object rather than a bunch of (randomly) ordered parameters If, as PR 10368 suggests, more parameters should be added to `getViewport` I think that it would be a mistake to not change the signature first to avoid needlessly unwieldy call-sites. To not break any existing code and third-party use-cases, this is obviously implemented with a deprecation warning and with a working fallback[1] for the old method signature. --- [1] This is limited to `GENERIC` builds, which should be sufficient.	2018-12-21 11:55:20 +01:00
Jonas Jenwald	b05f053287	[api-minor] Add support for OpenAction destinations (issue 10332) Note that the OpenAction dictionary may contain other information besides just a destination array, e.g. instructions for auto-printing[1]. Given first of all that an arbitrary `Dict` cannot be sent from the Worker (since cloning would fail), and second of all that the data obviously needs to be validated, this patch purposely only adds support for fetching a destination from the OpenAction entry[2]. --- [1] This information is, currently in PDF.js, being included through the `getJavaScript` API method. [2] This significantly reduces the complexity of the implementation, which seems fine for now. If there's ever need for other kinds of OpenAction to be fetched, additional API methods could/should be implemented as necessary (could e.g. follow the `getOpenActionWhatever` naming scheme).	2018-12-19 11:45:16 +01:00
Jonas Jenwald	ba2edeae18	[api-minor] Add support, in `getMetadata`, for custom information dictionary entries (issue 5970, issue 10344) (#10346 ) The custom entries, provided that they exist and that their types are safe to include, are exposed through a new `Custom` infoDict entry to clearly separate them from the standard ones. Fixes 5970. Fixes 10344.	2018-12-18 23:26:02 +01:00
Jonas Jenwald	437fb8a8a7	Ignore the `fieldValue` for Signature annotations, since they're currently unsupported (issue 10374) Given that Signature (Widget) annotations are currently not supported, since they cannot be validated, simply ignoring the `fieldValue` seems OK for now considering that attempting to blindly include unparsed/unvalidated data isn't very useful. Fixes 10347.	2018-12-12 18:01:43 +01:00
Tim van der Meij	45c0197465	Merge pull request #10330 from janpe2/svg-line-width-zero Handle line width of zero in SVG	2018-12-07 23:34:27 +01:00
Jani Pehkonen	ddabeb0645	Handle line width of zero in SVG	2018-12-04 16:05:32 +02:00
Jonas Jenwald	d0fec7c6fb	Fix `NameOrNumberTree.get` to actually perform a binary search to find the requested key The intent of the code, based on existing comments, is to perform a binary search. However, because of what appears to be a typo in the code responsible for computing the current search index, this code is always checking every entry (albeit only at the "final" node) starting from the last one.	2018-11-23 23:52:33 +01:00
Jonas Jenwald	fdad0a0b0b	Fallback to an exhaustive search, in corrupt PDF files, for NameTrees/NumberTrees that are not correctly ordered (issue 10272) According to the specification, see https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G6.2384179, the keys of NameTree/NumberTree should be ordered. For corrupt PDF files, which violate this assumption, we thus need to fallback to an exhaustive search in order to e.g. find all destinations. Please note: Given that this only implements a fallback for the "final" node of the Tree, there's obviously a risk that the patch isn't sufficient for dealing with all kinds of out-of-order corruption. However, this kind of problem should be rare in practice, and without a real-world test-case it's difficult to implement a completely general solution (and there's obviously a question if you'd even want to).	2018-11-20 17:50:47 +01:00
Jani Pehkonen	9e990f6f3e	Repair CFF fonts if stem hints are in wrong order	2018-11-20 18:50:37 +02:00
Jonas Jenwald	ac6b94c9dd	Replace the remaining occurences, in `src/display/api.js`, of `var` with `let`/`const`	2018-11-18 19:08:27 +01:00
Jonas Jenwald	061f7bd2f3	Convert `PDFWorker`, in `src/display/api.js`, to an ES6 class Also changes all occurrences of `var` to `let`/`const` in this code.	2018-11-18 19:08:27 +01:00
Jonas Jenwald	02e77a39ec	Convert `InternalRenderTask`, in `src/display/api.js`, to an ES6 class This changes all occurrences of `var` to `let`/`const` in this code, and updates the signature of the constructor to use object destructuring for better readability (and self documentation). Also, `useRequestAnimationFrame` is changed to a parameter and the `typeof window` check is now done once rather than at every `_scheduleNext` call.	2018-11-18 19:08:27 +01:00
Jonas Jenwald	5a0d64a6de	Convert `PDFPageProxy`, in `src/display/api.js`, to an ES6 class This changes all occurrences of `var` to `let`/`const` in this code, and updates the signatures of a couple of methods to use object destructuring. Finally, when creating `InternalRenderTask` instances only the necessary parameter are now provided, since passing through the `RenderParameters` as-is seems completely unnecessary.	2018-11-18 19:08:25 +01:00
Jonas Jenwald	2c003a82d5	Convert `RenderTask`, in `src/display/api.js`, to an ES6 class Also deprecates the `then` method, in favour of the `promise` getter.	2018-11-18 19:08:00 +01:00
Jonas Jenwald	ef8e5fd77c	Convert `PDFDocumentLoadingTask`, in `src/display/api.js`, to an ES6 class Also deprecates the `then` method, in favour of the `promise` getter.	2018-11-18 19:07:57 +01:00
PalmerAL	5f15dc2023	Use `span` instead of `div` in the text layer This improves copy/pasting text content since it reduces the amount of unnecessary newlines.	2018-11-18 15:54:08 +01:00
Tim van der Meij	2194aef03e	Merge pull request #10238 from Snuffleupagus/interfaces Move the `interface` definitions out of `src/core/worker.js` and into their own file	2018-11-08 23:28:46 +01:00
Jonas Jenwald	4829f567c1	Move the `interface` definitions out of `src/core/worker.js` and into their own file These interfaces are already used in different files, in both the `src/core/` and `src/display/` folders, and having them reside in their own file seems a lot clearer and is also similar to the existing viewer interfaces. As part of moving the `interface` definitions, they're also converted to ES6 classes.	2018-11-08 13:21:37 +01:00
Jonas Jenwald	60da2d882b	[api-minor] Refactor/simplify the `PDFObject` class First of all, note how there's currently two methods for checking if a certain object exists, which seems completely unwarranted. Furthermore, the rarely used `getData` method was removed and its only callsite changed to use a combination of `PDFObjects.{has, get}` instead. Finally, the methods were rearranged slightly, to bring the most important ones (for an API user) to the top of the class.	2018-11-08 10:13:39 +01:00
Jonas Jenwald	d32321d84f	Convert `PDFObjects`, in `src/display/api.js`, to an ES6 class Also changes all occurrences of `var` to `const`, and marks internal properties/methods as "private".	2018-11-08 10:11:40 +01:00
Tim van der Meij	3e342554d1	Merge pull request #10228 from morille/patch-2 Don't detect nw.js as node.js	2018-11-07 23:51:01 +01:00
Romain Petit	13b0ca6b2a	Don't detect nw.js as node.js nw.js is chrome plus nodejs. It will succeed everywhere chrome succeeds, but fail in many cases where nodejs succeeds (see issue 9071). So it's safer to consider it as a browser context rather than a nodejs context. Make travis happy again CS Readability + Explanation The relevant portion of the NW.js documentation: http://docs.nwjs.io/en/latest/For%20Users/Advanced/JavaScript%20Contexts%20in%20NW.js/#access-nodejs-and-nwjs-api-in-browser-context Added full link to relevant doc.	2018-11-07 11:14:22 +01:00
Jonas Jenwald	a963d139dc	Convert `src/core/ps_parser.js` to use ES6 classes Besides being a fairly small and self-contained file, this code also shows a possible way of defining static constants on classes.	2018-11-03 17:43:06 +01:00
Tim van der Meij	ec76aa531e	Merge pull request #10202 from Snuffleupagus/issue-10200 Attempt to clean-up/restore pending rendering operations on `RenderTask.cancel` (issue 10200)	2018-11-02 23:11:47 +01:00
Jonas Jenwald	f23dba1c10	Change `canvasInRendering` to a `WeakSet` instead of a `WeakMap` Note how nowhere in the code `canvasInRendering.get()` is ever called, and that this structure is really only used to store references to `<canvas>` DOM elements. The reason for this being a `WeakMap` is probably because at the time we weren't using `core-js` polyfills yet, and since there already existed a manually implemented `WeakMap` polyfill it was probably simpler to use that.	2018-10-31 18:15:23 +01:00
Jonas Jenwald	f77b463339	Attempt to clean-up/restore pending rendering operations on `RenderTask.cancel` (issue 10200) Please note that, given the lack of a runnable example, I'm not totally sure if this first of all is enough to completely address the issue as filed and second of all if we actually want this new behaviour.	2018-10-31 16:22:17 +01:00
Tim van der Meij	ed4ac1bc67	Merge pull request #10162 from janpe2/svg-normalize-bbox Normalize BBox of form XObjects in SVG back-end	2018-10-28 13:18:48 +01:00
Jani Pehkonen	9cd5f94f03	Normalize the BBox of form XObjects on the /core side	2018-10-22 14:17:05 +03:00
Jonas Jenwald	5bb7f4b615	Convert `PDFDataRangeTransport` to an ES6 class	2018-10-20 17:15:27 +02:00
Tim van der Meij	d21892933d	Merge pull request #10161 from Snuffleupagus/DataLoaded-onProgress Ensure that `onProgress` is always called when the entire PDF file has been loaded, regardless of how it was fetched (issue 10160)	2018-10-20 15:22:05 +02:00
Jonas Jenwald	54f9883c51	Export `CMapCompressionType` and `PermissionFlag` on the `pdfjsLib` object (issue 10148, PR 10033 follow-up) `CMapCompressionType` makes a lot of sense to export, for anyone attempting to implement a custom `CMapReaderFactory`; fixes 10148. `PermissionFlag` likewise needs to be exported, since otherwise the result of the `getPermissions` API method becomes difficult to interpret; follow-up to 10033.	2018-10-20 11:38:00 +02:00
Jonas Jenwald	327f2eb588	Ensure that `onProgress` is always called when the entire PDF file has been loaded, regardless of how it was fetched (issue 10160) Please note: I'm totally fine with this patch being rejected, and the issue closed as WONTFIX; however these changes should address the issue if that's desired. From a conceptual point of view, reporting loading progress doesn't really make a lot of sense for PDF files opened by passing raw binary data directly to `getDocument` (since obviously all data was loaded). This is compared to PDF files loaded via e.g. `XMLHttpRequest` or the Fetch API, where the entire PDF file isn't available from the start and knowing the loading progress makes total sense. However I can certainly see why the current API could be considered inconsistent, which isn't great, since a registered `onProgress` callback will never be called for certain `getDocument` calls. The simplest solution to this inconsistency thus seem to be to ensure that `onProgress` is always called when handling the `DataLoaded` message, since that will always be dispatched[1] from the worker-thread. --- [1] Note that this isn't guaranteed to happen, since setting `disableAutoFetch = true` often prevents the entire file from ever loading. However, this isn't relevant for the issue at hand, and is a well-known consequence of using `disableAutoFetch = true`; note how the default viewer even has a specialized code-path for hiding the loadingBar.	2018-10-16 13:51:12 +02:00
Jonas Jenwald	4cde844ffe	Add a `DOMTokenList.toggle` polyfill for the second, optional, "force" parameter This is based on the polyfill available at https://developer.mozilla.org/en-US/docs/Web/API/Element/classList#Polyfill	2018-10-12 15:41:09 +02:00
Tim van der Meij	9e9426c354	Merge pull request #10143 from Snuffleupagus/getMainThreadWorkerMessageHandler-catch-errors Ensure that `getMainThreadWorkerMessageHandler` won't accidentally break `getDocument` (PR 10139 follow-up)	2018-10-11 00:05:01 +02:00
Jonas Jenwald	0e2c6047e4	Ensure that `getMainThreadWorkerMessageHandler` won't accidentally break `getDocument` (PR 10139 follow-up) This should have been part of PR 10139. In the event that a user has attempted to manually load the worker file on the main-thread, but somehow failed to do that correctly, there's a possibility that `getMainThreadWorkerMessageHandler` could throw. Considering how/where that helper function is being called, an error could still prevent `PDFDocumentLoadingTask` from completing (regardless if it's being resolved/rejected).	2018-10-09 15:44:31 +02:00
Jonas Jenwald	21c8dd4842	Combine the `pdfjsFilePath` and fallback `workerSrc` handling in `src/display/api.js` With the way that the `getWorkerSrc()` helper function is implemented now, there's no longer a particularly strong reason for keeping the global `pdfjsFilePath` variable around. With this patch the fallback `workerSrc` will thus, assuming is wasn't already set, be set to the "pdfjsFilePath" which simplifies the `getWorkerSrc()` function and reduces the amount of global state. Finally, the global `workerSrc` variable was renamed to prevent shadowing.	2018-10-09 13:47:48 +02:00
Tim van der Meij	f45e46d7ad	Merge pull request #10133 from kevinleedrum/fix-content-length Set returnValues.suggestedLength to Content-Length if integer	2018-10-09 00:05:57 +02:00
Kevin Lee Drum	4cf10ac79d	set returnValues.suggestedLength to Content-Length if integer	2018-10-07 13:26:29 -04:00
Jonas Jenwald	755c6edc5e	Ensure that the `PDFDocumentLoadingTask` is rejected when "setting up fake worker" failed (issue 10135) This should, hopefully, cover all the possible ways[1] in which "fake workers" are loaded. Given the different code-paths, adding unit-tests might not be that simple. Note that in order to make this work, the various `fakeWorkerFilesLoader` functions were converted to return `Promises`. --- [1] Unfortunately there's lots of them, for various build targets and configurations.	2018-10-06 13:18:51 +02:00
Simon Leblanc	b5806735d8	Add support of Ink annotation	2018-10-03 00:28:49 +02:00
Tim van der Meij	138324502c	Merge pull request #10119 from Snuffleupagus/rm-onFileAttachmentAnnotation Attempt to simplify the `fileattachmentannotation` event dispatching	2018-10-02 23:25:22 +02:00
Jonas Jenwald	d60ce998f1	Attempt to simplify the `fileattachmentannotation` event dispatching This attempts to reduced the level of indirection, and the amount of code, when dispatching `fileattachmentannotation` events, by removing the `PDFLinkService.onFileAttachmentAnnotation` method and just accessing `PDFLinkService.eventBus` directly in the `FileAttachmentAnnotationElement` constructor. Given that other properties, such as `externalLinkTarget`/`externalLinkRel`, are already being accessed directly this pattern seems fine here as well.	2018-10-01 15:09:08 +02:00

... 22 23 24 25 26 ...

4573 Commits