Sakurai/pdf.js - pdf.js - Gitea on kemo

Sakurai/pdf.js

Author	SHA1	Message	Date
calixteman	84d7cccb1d	JS - Handle correctly hierarchy of fields (#13133 ) * JS - Handle correctly hierarchy of fields - it aims to fix #13132; - annotations can inherit their actions from the parent field; - there are some fields which act as a container for other fields: - they can be access through js so need to add them with an empty type (nothing in the spec about that but checked in Acrobat); - calculation order list (CO) can reference them so need make them through this.getField; - getArray method must return kids. - field values are number, string, ... depending of their type but nothing in the spec on how to know what's the type: - according to the comment for Canonical Format: https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf#page=461 - it seems that this "type" can be guessed from js action Format (when setting a type in Acrobat DC, the only affected thing is this action). - util.scand with an empty string returns the current date.	2021-03-30 08:50:35 -07:00
calixteman	24e598a895	XFA - Add a layer to display XFA forms (#13069 ) - add an option to enable XFA rendering if any; - for now, let the canvas layer: it could be useful to implement XFAF forms (embedded pdf in xml stream for the background and xfa form for the foreground); - ui elements in template DOM are pretty close to their html counterpart so we generate a fake html DOM from template one: - it makes easier to translate template properties to html ones; - it makes faster the creation of the html element in the main thread.	2021-03-19 10:11:40 +01:00
Calixte Denizet	ffd4bc790c	JS -- Add tests for print/save actions * change PDFDocument::hasJSActions to return true when there are JS actions in catalog.	2020-12-24 18:51:00 +01:00
Calixte Denizet	1e2173f038	JS - Collect and execute actions at doc and pages level * the goal is to execute actions like Open or OpenAction * can be tested with issue6106.pdf (auto-print) * once #12701 is merged, we can add page actions	2020-12-18 20:03:59 +01:00
Calixte Denizet	03814bd6a2	Don't use 'in' operator to check if key is in a Map	2020-12-16 16:00:12 +01:00
Tim van der Meij	00b4f86db3	Merge pull request #12717 from Snuffleupagus/issue-12714 Ensure that the /Annots-entry, on /Page-instances, is actually an Array (issue 12714)	2020-12-10 23:06:59 +01:00
Calixte Denizet	25bf504ff5	Be sure that CalculationOrder is either null or a non-empty array	2020-12-10 16:02:11 +01:00
Jonas Jenwald	796a0d3155	Ensure that the /Annots-entry, on /Page-instances, is actually an Array (issue 12714) In the referenced PDF document, the second and third page has corrupt /Annots-entries which contain /Dict-data rather than the intended Arrays.	2020-12-10 11:42:00 +01:00
Calixte Denizet	b11592a756	JS -- hidden annotations must be built in case a script show them * in some pdf, there are actions with "event.source.hidden = ..." * in order to handle visibility when printing, annotationStorage is extended to store multiple properties (value, hidden, editable, ...)	2020-11-10 12:48:34 +01:00
Calixte Denizet	a5279897a7	JS -- Add listener for sandbox events only if there are some actions * When no actions then set it to null instead of empty object * Even if a field has no actions, it needs to listen to events from the sandbox in order to be updated if an action changes something in it.	2020-11-09 18:37:59 +01:00
Jonas Jenwald	082cd8fc6c	Add global caching, for /Resources without blend modes, and use it to reduce repeated fetching/parsing in `PartialEvaluator.hasBlendModes` The `PartialEvaluator.hasBlendModes` method is necessary to determine if there's any blend modes on a page, which unfortunately requires synchronous parsing of the /Resources of each page before its rendering can start (see the "StartRenderPage"-message). In practice it's not uncommon for certain /Resources-entries to be found on more than one page (referenced via the XRef-table), which thus leads to unnecessary re-fetching/re-parsing of data in `PartialEvaluator.hasBlendModes`. To improve performance, especially in pathological cases, we can cache /Resources-entries when it's absolutely clear that they do not contain any blend modes at all[1]. This way, subsequent `PartialEvaluator.hasBlendModes` calls can be made significantly more efficient. This patch was tested using the PDF file from issue 6961, i.e. https://github.com/mozilla/pdf.js/files/121712/test.pdf: ``` [ { "id": "issue6961", "file": "../web/pdfs/issue6961.pdf", "md5": "a80e4357a8fda758d96c2c76f2980b03", "rounds": 100, "type": "eq" } ] ``` which gave the following results when comparing this patch against the `master` branch: ``` -- Grouped By browser, page, stat -- browser \| page \| stat \| Count \| Baseline(ms) \| Current(ms) \| +/- \| % \| Result(P<.05) ------- \| ---- \| ------------ \| ----- \| ------------ \| ----------- \| ---- \| ------ \| ------------- firefox \| 0 \| Overall \| 100 \| 1034 \| 555 \| -480 \| -46.39 \| faster firefox \| 0 \| Page Request \| 100 \| 489 \| 7 \| -482 \| -98.67 \| faster firefox \| 0 \| Rendering \| 100 \| 545 \| 548 \| 2 \| 0.45 \| firefox \| 1 \| Overall \| 100 \| 912 \| 428 \| -484 \| -53.06 \| faster firefox \| 1 \| Page Request \| 100 \| 487 \| 1 \| -486 \| -99.77 \| faster firefox \| 1 \| Rendering \| 100 \| 425 \| 427 \| 2 \| 0.51 \| ``` --- [1] In the case where blend modes are found, it becomes a lot more difficult to know if it's generally safe to skip /Resources-entries. Hence we don't cache anything in that case, however note that most document/pages do not utilize blend modes anyway.	2020-11-05 16:59:08 +01:00
Jonas Jenwald	b44a975d7c	Prevent issues, in `PDFDocument.fieldObjects`, for invalid Annotations For an invalid Annotation, there's one code-path where `undefined` is returned from `AnnotationFactory._create`. That'd currently, incorrectly, trigger an error during the `PDFDocument._collectFieldObjects` parsing which thus seem good to avoid. Along these lines, the filtering in `PDFDocument.fieldObjects` is also updated to handle both `null` and `undefined` the same way.	2020-10-22 13:24:43 +02:00
Calixte Denizet	c30a3a94f0	JS - Add a function in api to get the fields ids in AcroForm::CO	2020-10-17 12:56:40 +02:00
Jonas Jenwald	29af15f37e	Add more validation in the `PDFDocument._hasOnlyDocumentSignatures` method If this method is ever passed invalid/unexpected data, or if during the course of parsing (since it's used recursively) such data is found, it will fail in a non-graceful way. Hence this patch, which ensures that we don't attempt to access non-existent properties and also that errors such as the one fixed in PR 12479 wouldn't have occured.	2020-10-16 13:03:47 +02:00
Jonas Jenwald	3351d3476d	Don't store complex data in `PDFDocument.formInfo`, and replace the `fields` object with a `hasFields` boolean instead This patch is based on a couple of smaller things that I noticed when working on PR 12479. - Don't store the /Fields on the `formInfo` getter, since that feels like overloading it with unintended (and too complex) data, and utilize a `hasFields` boolean instead. This functionality was originally added in PR 12271, to help determine what kind of form data a PDF document contains, and I think that we should ensure that the return value of `formInfo` only consists of "simple" data. With these changes the `fieldObjects` getter instead has to look-up the /Fields manually, however that shouldn't be a problem since the access is guarded by a `formInfo.hasFields` check which ensures that the data both exists and is valid. Furthermore, most documents doesn't even have any /AcroForm data anyway. - Determine the `hasFields` property first, to ensure that it's always correct even if there's errors when checking e.g. the /XFA or /SigFlags entires, since the `fieldObjects` getter depends on it. - Simplify a loop in `fieldObjects`, since the object being accessed is a `Map` and those have built-in iteration support. - Use a higher logging level for errors in the `formInfo` getter, and include the actual error message, since that'd have helped with fixing PR 12479 a lot quicker. - Update the JSDoc comment in `src/display/api.js` to list the return values correctly, and also slightly extend/improve the description.	2020-10-16 12:47:27 +02:00
Calixte Denizet	71ecc3129b	Add the possibility to collect Javascript actions	2020-10-14 10:44:16 +02:00
Jonas Jenwald	9416b14e8b	Re-factor how the ESLint `no-var` rule is enabled in the `src/` folder This simplifies/consolidates the ESLint configuration slightly in the `src/` folder, and prevents the addition of any new files where `var` is being used.[1] Hence we no longer need to manually add `/* eslint no-var: error */` in files, which is easy to forget, and can instead disable the rule in the `src/core/` files where `var` is still in use. --- [1] Obviously the `no-var` rule can, in the same way as every other rule, be disabled on a case-by-case basis where actually necessary.	2020-10-03 20:15:29 +02:00
Jonas Jenwald	784a420027	Add support, in `Dict.merge`, for merging of "sub"-dictionaries This allows for merging of dictionaries one level deeper than previously. This could be useful e.g. for /Resources dictionaries, where you want to e.g. merge their respective /Font dictionaries (and other) together rather than picking just the first one.	2020-08-30 23:18:32 +02:00
Tim van der Meij	0f229d537f	Inline the `setup` method in the `parse` method in `src/core/document.js` Now that the `parse` method is simplified we can inline the `setup` method in the `parse` method since it's only two lines of code. This avoids some indirection.	2020-08-25 23:28:55 +02:00
Tim van der Meij	280207c740	Redo the form type detection logic and include unit tests Good form type detection is important to get reliable telemetry and to only show the fallback bar if a form cannot be filled out by the user. PDF.js only supports AcroForm data, so XFA data is explicitly unsupported (tracked in issue #2373). However, the previous form type detection couldn't separate AcroForm and XFA well enough, causing form type telemetry to be incorrect sometimes and the fallback bar to be shown for forms that could in fact be filled out by the user. The solution in this commit is found by studying the specification and the form documents that are available to us. In a nutshell the rules are: - There is XFA data if the `XFA` entry is a non-empty array or stream. - There is AcroForm data if the `Fields` entry is a non-empty array and it doesn't consist of only document signatures. The document signatures part was not handled in the old code, causing a document with only XFA data to also be marked as having AcroForm data. Moreover, the old code didn't check all the data types. Now that AcroForm and XFA can be distinguished, the viewer is configured to only show the fallback bar for documents that only have XFA data. If a document also has AcroForm data, the viewer can use that to render the form. We have not found documents where the XFA data was necessary in that case. Finally, we include unit tests to ensure that all cases are covered and move the form type detection out of the `parse` function so that it's only executed if the document information is actually requested (potentially making initial parsing a tiny bit faster).	2020-08-25 23:28:55 +02:00
Tim van der Meij	f20f0bcc78	Move the AcroForm logic from the document to the catalog The `AcroForm` entry is part of the catalog, not of the document, so its logic should be placed there instead. The document should look in the catalog to fetch it, and not have knowledge of `catDict`, which is a member internal to the catalog. Moreover, make the AcroForm member private on the document instance. It's only used internally and was also never intended to be public. For users it's exposed by the `getMetadata` API endpoint as `IsAcroFormPresent`. Only a boolean is exposed, so we now also only store the boolean on the document instance. Finally, the annotation code needs access to the full AcroForm dictionary, so it's updated to fetch the data from the catalog instead of the document that now only holds the boolean.	2020-08-25 23:28:55 +02:00
Tim van der Meij	b41a2f4d5a	Move the collection logic from the document to the catalog The `Collection` entry is part of the catalog, not of the document, so its logic should be placed there instead. The document should look in the catalog to fetch it, and not have knowledge of `catDict`, which is a member internal to the catalog. Moreover, remove the collection member from the document instance. It's only used internally and was also never intended to be public. For users it's exposed by the `getMetadata` API endpoint as `IsCollectionPresent`. Moving this out of the `parse` function makes sure that the getter is only executed if the document information is actually requested (potentially making initial parsing a tiny bit faster).	2020-08-25 23:28:55 +02:00
Tim van der Meij	935d95b462	Move the version logic from the document to the catalog The `Version` entry is part of the catalog, not of the document, so its logic should be placed there instead. The document should look in the catalog to fetch it, and not have knowledge of `catDict`, which is a member internal to the catalog. Moreover, make the version member private on the document instance. It's only used internally and was also never intended to be public. For users it's exposed by the `getMetadata` API endpoint as `PDFFormatVersion`. Finally, clarify how the version from the header and the version from the catalog are treated using a comment.	2020-08-25 23:28:55 +02:00
Calixte Denizet	1a6816ba98	Add support for saving forms	2020-08-12 10:32:59 +02:00
Calixte Denizet	584902dbf8	Add an annotation storage in order to save annotation data in acroforms	2020-07-24 10:50:11 +02:00
Jonas Jenwald	6381b5b08f	Add a `size` getter, to `Dict` instances, to provide an easier way of checking the number of entries This removes the need to manually call `Dict.getKeys()` and check its length.	2020-07-17 16:06:11 +02:00
Jonas Jenwald	4cc6797f17	Re-factor the `idFactory` functionality, used in the `core/`-code, and move the `fontID` generation into it Note how the `getFontID`-method in `src/core/fonts.js` is completely global, rather than properly tied to the current document. This means that if you repeatedly open and parse/render, and then close, even the same PDF document the `fontID`s will still be incremented continuously. For comparison the `createObjId` method, on `idFactory`, will always create a consistent id, assuming of course that the document and its pages are parsed/rendered in the same order. In order to address this inconsistency, it thus seems reasonable to add a new `createFontId` method on the `idFactory` and use that when obtaining `fontID`s. (When the current `getFontID` method was added the `idFactory` didn't actually exist yet, which explains why the code looks the way it does.) Please note: Since the document id is (still) part of the `loadedName`, it's thus not possible for different documents to have identical font names.	2020-07-07 16:33:31 +02:00
Jonas Jenwald	ca719ecaa4	Add local caching of `Function`s, by reference, in the `PDFFunctionFactory` (issue 2541) Note that compared other structures, such as e.g. Images and ColorSpaces, `Function`s are not referred to by name, which however does bring the advantage of being able to share the cache for an entire page. Furthermore, similar to ColorSpaces, the parsing of individual `Function`s are generally fast enough to not really warrant trying to cache them in any "smarter" way than by reference. (Hence trying to do caching similar to e.g. Fonts would most likely be a losing proposition, given the amount of data lookup/parsing that'd be required.) Originally I tried implementing this similar to e.g. the recently added ColorSpace caching (and in a couple of different ways), however it unfortunately turned out to be quite ugly/unwieldy given the sheer number of functions/methods where you'd thus need to pass in a `LocalFunctionCache` instance. (Also, the affected functions/methods didn't exactly have short signatures as-is.) After going back and forth on this for a while it seemed to me that the simplest, or least "invasive" if you will, solution would be if each `PartialEvaluator` instance had its own `PDFFunctionFactory` instance (since the latter is already passed to all of the required code). This way each `PDFFunctionFactory` instances could have a local `Function` cache, without it being necessary to provide a `LocalFunctionCache` instance manually at every `PDFFunctionFactory.{create, createFromArray}` call-site. Obviously, with this patch, there's now (potentially) more `PDFFunctionFactory` instances than before when the entire document shared just one. However, each such instance is really quite small and it's also tied to a `PartialEvaluator` instance and those are not kept alive and/or cached. To reduce the impact of these changes, I've tried to make as many of these structures as possible lazily initialized, specifically: - The `PDFFunctionFactory`, on `PartialEvaluator` instances, since not all kinds of general parsing actually requires it. For example: `getTextContent` calls won't cause any `Function` to be parsed, and even some `getOperatorList` calls won't trigger `Function` parsing (if a page contains e.g. no Patterns or "complex" ColorSpaces). - The `LocalFunctionCache`, on `PDFFunctionFactory` instances, since only certain parsing requires it. Generally speaking, only e.g. Patterns, "complex" ColorSpaces, and/or (some) SoftMasks will trigger any `Function` parsing. To put these changes into perspective, when loading/rendering all (14) pages of the default `tracemonkey.pdf` file there's now a total of 6 `PDFFunctionFactory` and 1 `LocalFunctionCache` instances created thanks to the lazy initialization. (If you instead would keep the document-"global" `PDFFunctionFactory` instance and pass around `LocalFunctionCache` instances everywhere, the numbers for the `tracemonkey.pdf` file would be instead be something like 1 `PDFFunctionFactory` and 6 `LocalFunctionCache` instances.) All-in-all, I thus don't think that the `PDFFunctionFactory` changes should be generally problematic. With these changes, we can also modify (some) call-sites to pass in a `Reference` rather than the actual `Function` data. This is nice since `Function`s can also be `Streams`, which are not cached on the `XRef` instance (given their potential size), and this way we can avoid unnecessary lookups and thus save some additional time/resources. Obviously I had intended to include (standard) benchmark results with these changes, but for reasons I don't really understand the test run-time (even with `master`) of the document in issue 2541 is quite a bit slower than in the development viewer. However, logging the time it takes for the relevant `PDFFunctionFactory`/`PDFFunction ` parsing shows that it takes approximately `0.5 ms` for the `Function` in question. Looking up a cached `Function`, on the other hand, is one order of magnitude faster which does add up when the same `Function` is invoked close to 2000 times.	2020-07-04 00:55:18 +02:00
Jonas Jenwald	02a1d0f6c5	Remove the unused `intent`/`pageIndex` properties from `OperatorList` instances (PR 11069 follow-up) Apparently I completely overlooked the fact that with the changes in PR 11069 these properties became completely unused, and consequently they thus ought to be removed.	2020-06-11 16:05:38 +02:00
Jonas Jenwald	8af70d75aa	Allow `GlobalImageCache.clear` to, optionally, only remove the actual data (PR 11912 follow-up) When "Cleanup" is triggered, you obviously need to remove all globally cached data on both the main- and worker-threads. However, the current the implementation of the `GlobalImageCache.clear` method also means that we lose all information about which images were cached and not just their data. This thus has the somewhat unfortunate side-effect of requiring images, which were previously known to be "global", to again having to reach `NUM_PAGES_THRESHOLD` before being cached again. To avoid doing unnecessary parsing after "Cleanup", we can thus let `GlobalImageCache.clear` keep track of which images were cached while still removing their actual data. This should not have any significant impact on memory usage, since the only extra thing being kept is a `RefSetCache` (essentially an Object) with a couple of `Set`s containing only integers.	2020-05-23 11:30:24 +02:00
Jonas Jenwald	dda6626f40	Attempt to cache repeated images at the document, rather than the page, level (issue 11878) Currently image resources, as opposed to e.g. font resources, are handled exclusively on a page-specific basis. Generally speaking this makes sense, since pages are separate from each other, however there's PDF documents where many (or even all) pages actually references exactly the same image resources (through the XRef table). Hence, in some cases, we're decoding the same images over and over for every page which is obviously slow and wasting both CPU and memory resources better used elsewhere.[1] Obviously we cannot simply treat all image resources as-if they're used throughout the entire PDF document, since that would end up increasing memory usage too much.[2] However, by introducing a `GlobalImageCache` in the worker we can track image resources that appear on more than one page. Hence we can switch image resources from being page-specific to being document-specific, once the image resource has been seen on more than a certain number of pages. In many cases, such as e.g. the referenced issue, this patch will thus lead to reduced memory usage for image resources. Scrolling through all pages of the document, there's now only a few main-thread copies of the same image data, as opposed to one for each rendered page (i.e. there could theoretically be twenty copies of the image data). While this obviously benefit both CPU and memory usage in this case, for very large image data this patch may possibly increase persistent main-thread memory usage a tiny bit. Thus to avoid negatively affecting memory usage too much in general, particularly on the main-thread, the `GlobalImageCache` will only cache a certain number of image resources at the document level and simply fallback to the default behaviour. Unfortunately the asynchronous nature of the code, with ranged/streamed loading of data, actually makes all of this much more complicated than if all data could be assumed to be immediately available.[3] Please note: The patch will lead to small movement in some existing test-cases, since we're now using the built-in PDF.js JPEG decoder more. This was done in order to simplify the overall implementation, especially on the main-thread, by limiting it to only the `OPS.paintImageXObject` operator. --- [1] There's e.g. PDF documents that use the same image as background on all pages. [2] Given that data stored in the `commonObjs`, on the main-thread, are only cleared manually through `PDFDocumentProxy.cleanup`. This as opposed to data stored in the `objs` of each page, which is automatically removed when the page is cleaned-up e.g. by being evicted from the cache in the default viewer. [3] If the latter case were true, we could simply check for repeat images before parsing started and thus avoid handling any duplicate image resources.	2020-05-21 18:13:45 +02:00
Jonas Jenwald	73636e052a	Handle errors individually for each annotation in the `_parsedAnnotations` getter While working on PR 11872, it occurred to me that it probably wouldn't be a bad idea to change the `_parsedAnnotations` getter to handle errors individually for each annotation. This way, one broken/corrupt annotation won't prevent the rest of them from being e.g. fetched through the API.	2020-05-09 12:33:39 +02:00
Jonas Jenwald	e1f340a0c2	Use the ESLint `no-restricted-syntax` rule to ensure that `assert` is always called with two arguments Having `assert` calls without a message string isn't very helpful when debugging, and it turns out that it's easy enough to make use of ESLint to enforce better `assert` call-sites. In a couple of cases the `assert` calls were changed to "regular" throwing of errors instead, since that seemed more appropriate. Please find additional details about the ESLint rule at https://eslint.org/docs/rules/no-restricted-syntax	2020-05-05 13:40:05 +02:00
Jonas Jenwald	4aabd063fc	Gracefully handle annotation parsing errors in `Page.getOperatorList` (issue 11871) This should ensure that a page will always render successfully, even if there's errors during the Annotation fetching/parsing. Additionally the `OperatorList.addOpList` method is also adjusted to ignore invalid data, to make it slightly more robust.	2020-05-04 17:09:48 +02:00
Jonas Jenwald	1cc3dbb694	Enable the `dot-notation` ESLint rule Please note: These changes were done automatically, using the `gulp lint --fix` command. This rule is already enabled in mozilla-central, see https://searchfox.org/mozilla-central/rev/567b68b8ff4b6d607ba34a6f1926873d21a7b4d7/tools/lint/eslint/eslint-plugin-mozilla/lib/configs/recommended.js#103-104 The main advantage, besides improved consistency, of this rule is that it reduces the size of the code (by 3 bytes for each case). In the PDF.js code-base there's close to 8000 instances being fixed by the `dot-notation` ESLint rule, which end up reducing the size of even the built files significantly; the total size of the `gulp mozcentral` build target changes from `3 247 456` to `3 224 278` bytes, which is a reduction of `23 178` bytes (or ~0.7%) for a completely mechanical change. A large number of these changes affect the (large) lookup tables used on the worker-thread, but given that they are still initialized lazily I don't think that the new formatting this patch introduces should undo any of the improvements from PR 6915. Please find additional details about the ESLint rule at https://eslint.org/docs/rules/dot-notation	2020-04-17 12:24:46 +02:00
Jonas Jenwald	426945b480	Update Prettier to version 2.0 Please note that these changes were done automatically, using `gulp lint --fix`. Given that the major version number was increased, there's a fair number of (primarily whitespace) changes; please see https://prettier.io/blog/2020/03/21/2.0.0.html In order to reduce the size of these changes somewhat, this patch maintains the old "arrowParens" style for now (once mozilla-central updates Prettier we can simply choose the same formatting, assuming it will differ here).	2020-04-14 12:28:14 +02:00
Jonas Jenwald	216cbca16c	Remove variable shadowing from the JavaScript files in the `src/core/` folder This is part of a series of patches that will try to split PR 11566 into smaller chunks, to make reviewing more feasible. Once all the code has been fixed, we'll be able to eventually enable the ESLint no-shadow rule; see https://eslint.org/docs/rules/no-shadow	2020-03-23 18:28:30 +01:00
Jonas Jenwald	c5f67300e9	Rename the `isSpace` helper function to `isWhiteSpace` Trying to enable the ESLint rule `no-shadow`, against the `master` branch, would result in a fair number of errors in the `Glyph` class in `src/core/fonts.js`. Since the glyphs are exposed through the API, we can't very well change the `isSpace` property on `Glyph` instances. Thus the best approach seems, at least to me, to simply rename the `isSpace` helper function to `isWhiteSpace` which shouldn't cause any issues given that it's only used in the `src/core/` folder.	2020-03-12 11:36:59 +01:00
Jonas Jenwald	88c35d872f	Ensure that the PDF header contains an actual number (PR 11463 follow-up) While it would be nice to change the `PDFFormatVersion` property, as returned through `PDFDocumentProxy.getMetadata`, to a number (rather than a string) that would unfortunately be a breaking API change. However, it does seem like a good idea to at least validate the PDF header version on the worker-thread, rather than potentially returning an arbitrary string.	2020-02-07 12:25:07 +01:00
Tim van der Meij	3775b711ed	Merge pull request #11482 from Snuffleupagus/more-core-utils Convert `src/core/jpg.js` to use the `readUint16` helper function in `src/core/core_utils.js`, rather than re-implementing it twice	2020-01-25 21:38:34 +01:00
Jonas Jenwald	3f031f69c2	Move additional worker-thread only functions from `src/shared/util.js` and into a `src/core/core_utils.js` instead This moves the `log2`, `readInt8`, `readUint16`, `readUint32`, and `isSpace` functions since they are only used in the worker-thread.	2020-01-25 00:33:52 +01:00
Jonas Jenwald	090ff116d4	Ensure that full clean-up is always run when handling the "Terminate" message in `src/core/worker.js` This is beneficial in situations where the Worker is being re-used, for example with fake workers, since it ensures that things like font resources are actually released.	2020-01-16 15:11:56 +01:00
Jonas Jenwald	36881e3770	Ensure that all `import` and `require` statements, in the entire code-base, have a `.js` file extension In order to eventually get rid of SystemJS and start using native `import`s instead, we'll need to provide "complete" file identifiers since otherwise there'll be MIME type errors when attempting to use `import`.	2020-01-04 13:01:43 +01:00
Jonas Jenwald	a63f7ad486	Fix the linting errors, from the Prettier auto-formatting, that ESLint `--fix` couldn't handle This patch makes the follow changes: - Remove no longer necessary inline `// eslint-disable-...` comments. - Fix `// eslint-disable-...` comments that Prettier moved down, thus causing new linting errors. - Concatenate strings which now fit on just one line. - Fix comments that are now too long. - Finally, and most importantly, adjust comments that Prettier moved down, since the new positions often is confusing or outright wrong.	2019-12-26 12:35:12 +01:00
Jonas Jenwald	de36b2aaba	Enable auto-formatting of the entire code-base using Prettier (issue 11444) Note that Prettier, purposely, has only limited [configuration options](https://prettier.io/docs/en/options.html). The configuration file is based on [the one in `mozilla central`](https://searchfox.org/mozilla-central/source/.prettierrc) with just a few additions (to avoid future breakage if the defaults ever changes). Prettier is being used for a couple of reasons: - To be consistent with `mozilla-central`, where Prettier is already in use across the tree. - To ensure a consistent coding style everywhere, which is automatically enforced during linting (since Prettier is used as an ESLint plugin). This thus ends "all" formatting disussions once and for all, removing the need for review comments on most stylistic matters. Many ESLint options are now redundant, and I've tried my best to remove all the now unnecessary options (but I may have missed some). Note also that since Prettier considers the `printWidth` option as a guide, rather than a hard rule, this patch resorts to a small hack in the ESLint config to ensure that comments won't become too long. Please note: This patch is generated automatically, by appending the `--fix` argument to the ESLint call used in the `gulp lint` task. It will thus require some additional clean-up, which will be done in a separate commit. (On a more personal note, I'll readily admit that some of the changes Prettier makes are extremely ugly. However, in the name of consistency we'll probably have to live with that.)	2019-12-26 12:34:24 +01:00
Jonas Jenwald	8ec1dfde49	Add `// prettier-ignore` comments to prevent re-formatting of certain data structures There's a fair number of (primarily) `Array`s/`TypedArray`s whose formatting we don't want disturb, since in many cases that would lead to the code becoming much more difficult to read and/or break existing inline comments. Please note: It may be a good idea to look through these cases individually, and possibly re-write some of the them (especially the `String` ones) to reduce the need for all of these ignore commands.	2019-12-26 00:14:03 +01:00
Jonas Jenwald	dbb82f05fc	Re-factor the `find` helper function, in `src/core/document.js`, to search through the raw bytes rather than a string During initial parsing of every PDF document we're currently creating a few `1 kB` strings, in order to find certain commands needed for initialization. This seems inefficient, not to mention completely unnecessary, since we can just as well search through the raw bytes directly instead (similar to other parts of the code-base). One small complication here is the need to support backwards search, which does add some amount of "duplication" to this function. The main benefits here are: - No longer necessary to allocate temporary `1 kB` strings during initial parsing, thus saving some memory. - In practice, for well-formed PDF documents, the number of iterations required to find the commands are usually very low. (For the `tracemonkey.pdf` file, there's a total of only 30 loop iterations.)	2019-12-14 13:43:26 +01:00
Jonas Jenwald	b00835f589	Attempt to improve the `PDFDocument` error message for empty files (issue 5887) Given that the error in question is surfaced on the API-side, this patch makes the following changes: - Updates the wording such that it'll hopefully be slightly easier for users to understand. - Changes the plain `Error` to an `InvalidPDFException` instead, since that should work better with the existing Error handling. - Adds a unit-test which loads an empty PDF document (and also improves a pre-existing `InvalidPDFException` message and its test-case).	2019-12-09 15:45:50 +01:00
Jonas Jenwald	a02122e984	Ensure that `PDFDocument.checkFirstPage` waits for cleanup to complete (PR 10392 follow-up) Given how this method is currently used there shouldn't be any fonts loaded at the point in time where it's called, but it does seem like a bad idea to assume that that's always going to be the case. Since `PDFDocument.checkFirstPage` is already asynchronous, it's easy enough to simply await `Catalog.cleanup` here. (The patch also makes a tiny simplification in a loop in `Catalog.cleanup`.)	2019-12-07 12:31:41 +01:00
Jonas Jenwald	cc76132c24	Remove outdated, and misleading, JSDoc comment from the `PDFDocument` class The contents of this comment hasn't been correct for years, ever since the library was properly split into main/worker-threads, so it's probably high time for this to be updated.	2019-11-25 11:36:29 +01:00

1 2 3