pdf.js

Author	SHA1	Message	Date
Jonas Jenwald	a919959d83	Slightly simplify the `Catalog._readMarkInfo` method We don't need to first check if the Dictionary contains the key, since trying to get a non-existent key simply returns `undefined` and we're already ensuring that the value is a boolean. Furthermore, we shouldn't need to worry about the `Object.prototype` containing enumerable properties since the checks (in `src/core/worker.js`) done for `Array.prototype` indirectly also cover `Object`s. (Keep in mind that an `Array` is just a special kind of `Object` in JavaScript.)	2022-04-05 16:37:51 +02:00
Jonas Jenwald	1dc4713a0b	Re-factor the `isLittleEndian`/`isEvalSupported` caching This functionality is very old, hence we should be able to improve the caching a little bit with modern JavaScript features.	2022-04-05 16:01:01 +02:00
Calixte Denizet	f4fcb59a5e	Refactor some xfa*** getters in document.js - it's a follow-up of PR #14735.	2022-04-03 20:38:12 +02:00
Jonas Jenwald	f33ce5fc2d	Decode non-ASCII values found in the xfa:datasets (PR 14735 follow-up) Please note: This is possibly bad/wrong in general, but I figured that submitting it for review wouldn't hurt. It seems that even Adobe Reader doesn't handle the non-ASCII characters that appear in some of the fields correctly, however it should be pretty easy to improve things on the PDF.js side.	2022-04-01 11:54:34 +02:00
Jonas Jenwald	36a289d747	Merge pull request #14735 from calixteman/14685 [Annotations] Some annotations can have their values stored in the xfa:datasets	2022-04-01 11:30:16 +02:00
Calixte Denizet	0b597304c1	[Annotations] Some annotations can have their values stored in the xfa:datasets - it aims to fix #14685; - add a basic object to get values from the parsed datasets; - these annotations don't have an appearance so we must create one when printing or saving.	2022-04-01 10:28:04 +02:00
Jonas Jenwald	addb4cb12b	Use `String.prototype.repeat()` in a couple of spots Rather than using a temporary Array to manually create repeated strings, we can use `String.prototype.repeat()` instead. The reason that we didn't use this from the start is most likely because some browsers, notably IE, didn't support this; note https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/repeat#browser_compatibility	2022-03-30 15:42:40 +02:00
Calixte Denizet	ad3fb71a02	[Annotations] Add support for printing/saving choice list with multiple selections - it aims to fix issue #12189.	2022-03-29 18:59:44 +02:00
Calixte Denizet	18e79e3c0b	[text selection] Add the whitespaces present in the pdf in the text chunk - it aims to fix issue #14627; - the basic idea of the recent text refactoring was to only consider the rendered visible whitespaces. But sometimes, the heuristics aren't correct and although some whitespaces are in the text stream they weren't in the text chunks because they were too small. Hence we added some exceptions, for example, we always add a whitespace when it is between two non-whitespace chars but only when in the same Tj. So basically, this patch removes the constraint to have the chars in the same Tj (in using a circular buffer to save the two last chars) but don't add a space when the visible space is really too small (hence `NOT_A_SPACE_FACTOR`).	2022-03-27 14:34:56 +02:00
Jonas Jenwald	73d2ddac0d	Update npm packages Note that the Prettier update made it possible to move a couple of comments after `default:`-cases back to their original/intended positions, please see https://prettier.io/blog/2022/03/16/2.6.0.html	2022-03-20 10:59:13 +01:00
Tim van der Meij	5de6af4e64	Merge pull request #14683 from Snuffleupagus/sendTest-cleanup [src/display/api.js] Simplify the `sendTest` function, used with Worker initialization (PR 14291 follow-up)	2022-03-19 13:38:05 +01:00
Jonas Jenwald	c0736647f9	Add general iteration support in the `RefSet` and `RefSetCache` classes This patch removes the existing `forEach` methods, in favor of making the classes properly iterable instead. Given that the classes are using a `Set` respectively a `Map` internally, implementing this is very easy/efficient and allows us to simplify some existing code.	2022-03-18 14:27:34 +01:00
Jonas Jenwald	be2b1d5d2a	[src/display/api.js] Simplify the `sendTest` function, used with Worker initialization (PR 14291 follow-up) Given that we now only use Workers when `postMessage` transfers are supported, there's really no point in trying to send a "test" message without transfers present. Hence, if `postMessage` transfers are not supported by the browser, we'll now fallback to "fake" Workers immediately instead. The comment about Opera is also removed, since it was originally added back in PR 983 and mentions Opera `11.60` [which was released in 2011](https://en.wikipedia.org/wiki/History_of_the_Opera_web_browser#Version_11).	2022-03-16 13:25:41 +01:00
Jonas Jenwald	6a78f20b17	Simplify the `PDFDocument` constructor Originally the code in the `src/`-folder was shared between the main/worker-threads, and back then it probably made sense that the `PDFDocument` constructor accepted different arguments. However, for many years we've not been passing anything except Streams to `PDFDocument` and we should thus be able to slightly simplify that code. Note that for e.g. unit-tests of this code, using either a `NullStream` or a `StringStream` works just fine.	2022-03-08 17:13:47 +01:00
Tim van der Meij	5242c38af5	Merge pull request #14628 from Snuffleupagus/issue-14626 When `stopAtErrors` is set, throw rather than warn when exceeding `maxImageSize` (issue 14626)	2022-03-05 13:09:36 +01:00
Tim van der Meij	5d12ac576b	Merge pull request #14631 from Snuffleupagus/typedef-fixes Fix a couple of small typos in JSDoc `typedef` comments	2022-03-05 13:06:53 +01:00
Jonas Jenwald	939e6f0c4c	Fix a couple of small typos in JSDoc `typedef` comments While this doesn't affect the official API documentation, these cases should nonetheless be fixed.	2022-03-04 12:11:52 +01:00
Jonas Jenwald	1a7921dbf0	Compute the loca table `endOffset`, of the "first" glyph, correctly (issue 14618) When there are multiple empty glyphs at the start of the data, ensure that the "first" glyph gets a correct `endOffset` to avoid skipping it during parsing in the `sanitizeGlyph` function.	2022-03-03 14:22:45 +01:00
Jonas Jenwald	d0d5c596fb	When `stopAtErrors` is set, throw rather than warn when exceeding `maxImageSize` (issue 14626) The situation described in issue 14626 seems like a fairly special case, and it thus seem reasonable that we simply follow the same pattern as elsewhere in the `PartialEvaluator` when the `stopAtErrors` API-option is being used.	2022-03-03 13:11:29 +01:00
calixteman	046ff07ee3	Merge pull request #14610 from Snuffleupagus/jpx-resetContextProbabilities [JPEG 2000] Add support for resetContextProbabilities (bug 1731483)	2022-02-26 18:26:39 +01:00
Jonas Jenwald	99cd24ce3e	Remove the `isString` helper function The call-sites are replaced by direct `typeof`-checks instead, which removes unnecessary function calls. Note that in the `src/`-folder we already had more `typeof`-cases than `isString`-calls.	2022-02-26 16:33:41 +01:00
Jonas Jenwald	6bd4e0f5af	Re-factor the `PDFDocument.documentInfo` method This removes the `DocumentInfoValidators` structure, and thus (slightly) simplifies the code overall. With these changes we only have to iterate through, and validate, the actually available Dictionary entries.	2022-02-26 16:33:21 +01:00
Tim van der Meij	cf7ce0aa7e	Merge pull request #14600 from Snuffleupagus/getPageIndex-more-validation [api-minor] Add validation for the `PDFDocumentProxy.getPageIndex` method	2022-02-26 15:30:00 +01:00
Jeff Muizelaar	9b9609a6d8	[JPEG 2000] Add support for resetContextProbabilities (bug 1731483)	2022-02-26 13:05:23 +01:00
Jonas Jenwald	172d007598	[api-minor] Add validation for the `PDFDocumentProxy.getPageIndex` method Currently we'll happily attempt to send any argument passed to this method over to the worker-thread, without doing any sort of validation. That could obviously be quite bad, since there's first of all no protection against sending unclonable data. Secondly, it's also possible to pass data that will cause the `Ref.get` call in the worker-thread to fail immediately. In order to address all of these issues, we'll now properly validate the argument passed to `PDFDocumentProxy.getPageIndex` and when necessary reject already on the main-thread instead.	2022-02-24 12:01:51 +01:00
Jonas Jenwald	ec87995050	Ensure that `Cmd`/`Name` is only initialized with string arguments Trying to use a non-string argument in either a `Cmd` or a `Name` is not intended, and would basically be an implementation error. Hence we can add a non-PRODUCTION check to enforce this, similar to the existing one used e.g. in the `Dict.set` method.	2022-02-23 22:39:12 +01:00
Tim van der Meij	2bb96a708c	Merge pull request #14598 from Snuffleupagus/rm-isBool Re-factor the `Catalog.viewerPreferences` method and remove the `isBool` helper function	2022-02-23 20:36:56 +01:00
Jonas Jenwald	3704283f5b	Remove the `isBool` helper function The call-sites are replaced by direct `typeof`-checks instead, which removes unnecessary function calls.	2022-02-23 13:31:03 +01:00
Jonas Jenwald	82f1ee1755	Re-factor the `Catalog.viewerPreferences` method This removes the `ViewerPreferencesValidators` structure, and thus (slightly) simplifies the code overall. With these changes we only have to iterate through, and validate, the actually available Dictionary entries.	2022-02-23 13:25:56 +01:00
Jonas Jenwald	a2f9031e9a	Ensure that `Dict.set` only accepts string `key`s Trying to use a non-string `key` in a `Dict` is not intended, and would basically be an implementation error. Hence we can add a non-PRODUCTION check to enforce this, complementing the existing `value` check added in PR 11672.	2022-02-22 16:35:20 +01:00
Jonas Jenwald	05edd91bdb	Remove the `isNum` helper function The call-sites are replaced by direct `typeof`-checks instead, which removes unnecessary function calls. Note that in the `src/`-folder we already had more `typeof`-cases than `isNum`-calls. These changes were mostly done using regular expression search-and-replace, with two exceptions: - In `Font._charToGlyph` we no longer unconditionally update the `width`, since that seems completely unnecessary. - In `PDFDocument.documentInfo`, when parsing custom entries, we now do the `typeof`-check once.	2022-02-22 11:55:34 +01:00
Jonas Jenwald	b282814e38	Prefer `instanceof Name` rather than calling `isName()` with one argument Unless you actually need to check that something is both a `Name` and also of the correct type, using `instanceof Name` directly should be a tiny bit more efficient since it avoids one function call and an unnecessary `undefined` check. This patch uses ESLint to enforce this, since we obviously still want to keep the `isName` helper function for where it makes sense.	2022-02-21 12:45:00 +01:00
Jonas Jenwald	4df82ad31e	Prefer `instanceof Dict` rather than calling `isDict()` with one argument Unless you actually need to check that something is both a `Dict` and also of the correct type, using `instanceof Dict` directly should be a tiny bit more efficient since it avoids one function call and an unnecessary `undefined` check. This patch uses ESLint to enforce this, since we obviously still want to keep the `isDict` helper function for where it makes sense.	2022-02-21 12:44:56 +01:00
Jonas Jenwald	67b658e8d5	Prefer `instanceof Cmd` rather than calling `isCmd()` with one argument Unless you actually need to check that something is both a `Cmd` and also of the correct type, using `instanceof Cmd` directly should be a tiny bit more efficient since it avoids one function call and an unnecessary `undefined` check. This patch uses ESLint to enforce this, since we obviously still want to keep the `isCmd` helper function for where it makes sense.	2022-02-21 12:44:51 +01:00
Jonas Jenwald	2cb2f633ac	Remove the `isRef` helper function This helper function is not really needed, since it's just a wrapper around a simple `instanceof` check, and it only adds unnecessary indirection in the code.	2022-02-19 15:33:42 +01:00
Jonas Jenwald	1a31855977	Remove the `isStream` helper function At this point all the various Stream-classes extends an abstract base-class, hence this helper function is no longer necessary and only adds unnecessary indirection in the code.	2022-02-17 13:51:36 +01:00
Jonas Jenwald	fd319e94b3	Add a missing string-check in the `_collectJS` helper function Unfortunately I don't have a test-case that breaks without this change, however the `stringToPDFString` helper function will fail if anything other than a string is passed to it. The changes in this patch thus make this code more-or-less identical to that found in the `Catalog.{_collectJavaScript, parseDestDictionary}` methods.	2022-02-16 13:43:42 +01:00
Calixte Denizet	18e3a98c2b	[api-minor] Don't add in the text content the chars which are out-of-page (bug 1755201) - it aims to fix https://bugzilla.mozilla.org/show_bug.cgi?id=1755201; - if the glyph position is not within the view then skip it.	2022-02-13 21:07:11 +01:00
Jonas Jenwald	b89595fd20	[api-minor] Remove the, in `legacy` builds, bundled `ReadableStream` polyfill According to the MDN compatibility data, see https://developer.mozilla.org/en-US/docs/Web/API/ReadableStream#browser_compatibility, all browsers that we support have native `ReadableStream` implementations (since quite some time too). Hence only Node.js is now lagging behind w.r.t. `ReadableStream` support, and its experimental implementation doesn't really help us given the life-span of the LTS releases (see https://en.wikipedia.org/wiki/Node.js#Releases). It seems quite unfortunate to bundle a `ReadableStream` polyfill in the `legacy` builds when it's unnecessary in browsers, given its overall size, but fortunately we can avoid that by simply listing `web-streams-polyfill` as a dependency for the `pdfjs-dist` library.	2022-02-13 10:15:58 +01:00
Jonas Jenwald	64f3dbeb48	Let `Lexer.getNumber` treat a single minus sign as zero (bug 1753983) This appears to be consistent with the behaviour in both Adobe Reader and PDFium (in Google Chrome); this is essentially the same approach as used for a single decimal point in PR 9827.	2022-02-07 17:09:47 +01:00
Jonas Jenwald	403baa7bba	[api-minor] Remove the `normalizeWhitespace` option in the `PDFPageProxy.{getTextContent, streamTextContent}` methods (issue 14519, PR 14428 follow-up) With these changes, we'll now always replace all whitespaces with standard spaces (0x20). This behaviour is already, since many years, the default in both the viewer and the browser-tests.	2022-02-03 09:17:22 +01:00
Calixte Denizet	ae842e1c3a	[api-minor] Annotations - Adjust the font size in text field in considering the total width (bug 1721335) - it aims to fix #14502 and bug 1721335; - Acrobat and Pdfium do the same; - it'll avoid to have truncated data when printed; - change the factor to compute font size in using field height: lineHeight = 1.35*fontSize - this is the value used by Acrobat. - in order to not have truncated strings on the bottom, add few basic metrics for standard fonts.	2022-01-30 15:53:31 +01:00
Calixte Denizet	3a7004ca25	Take into account all rotations before comparing glyph positions - it aims to fix #14497; - previously, only rotations with an angle 0, 90, 180 or 270 were taken into account; - so generalize to any angle but keep the fast path for 0, 90, ... because they're likely more common than anything else.	2022-01-26 17:19:00 +01:00
Jonas Jenwald	8836593b9e	Add a (global) cache to the `getCharUnicodeCategory` function Given that the regular expression has already become more complex (after the initial patch adding it), it seems to me that it probably cannot hurt to add a global cache to reduce unnecessary re-parsing. Obviously the `Glyph`-instances are being cached per font, however in most documents multiple fonts are being used and in practice there's very often a fair amount of overlap between the /ToUnicode-data in different fonts[1]. Consider for example loading and rendering the entire `tracemonkey.pdf` document (from the test-suite), which isn't a particularily large document. In that case the `getCharUnicodeCategory` function is being called a total of `601` times, however there's only `106` unique unicode-chars being checked. Please note: In practice I suppose that this won't have a huge effect on overall performance, however given the relative simplicity of this patch I figured that it'd not hurt to submit it for review. --- [1] Consider e.g. how there's usually different fonts used for regular, bold, respectively italic text.	2022-01-25 09:59:34 +01:00
Calixte Denizet	e1d3a3b414	Remove the invisible format marks from the text chunks - it aims to fix issue #9186.	2022-01-24 13:47:24 +01:00
Tim van der Meij	23b6fde9fc	Merge pull request #14464 from Snuffleupagus/issue-14462 Support Type1 font files with incomplete /CharStrings definitions (issue 14462)	2022-01-19 20:38:46 +01:00
calixteman	b0231cc887	Merge pull request #14456 from calixteman/1749563 Font renderer - get int8 instead of uint8 in composite glyphes (bug 1749563)	2022-01-19 01:20:49 -08:00
Calixte Denizet	74f25d2755	Font renderer - get int8 instead of uint8 in composite glyphes (bug 1749563) - it aims to fix https://bugzilla.mozilla.org/show_bug.cgi?id=1749563; - use some helper functions to get (u\|i)int** values in buffer: it helps to have a clearer code; - in composite glyphes the translations values with a transformations are signed so consequently get some int8 instead of uint8; - add few TODOs.	2022-01-18 22:06:23 +01:00
Jonas Jenwald	a13ae5d97d	Support Type1 font files with incomplete /CharStrings definitions (issue 14462) Please refer to https://www.pdfa.org/norm-refs/Type1Fonts.pdf#page=15 for the expected format for the /CharStrings entries. In the referenced PDF document the /CharStrings are missing the expected end-token, which causes us to swallow the start of the next glyph name.	2022-01-17 18:55:22 +01:00
Jonas Jenwald	ba37d600d7	Make the `normalizeWhitespace` handling, in the `PartialEvaluator`, more efficient (PR 14428 follow-up) After the changes in PR 14428 we can directly, and more efficiently, handle whitespace conversion in `PartialEvaluator.getTextContent` when the `normalizeWhitespace` option is being used. This way we no longer need a separate helper function for this, and can avoid having to (again) iterate through the text and checking each character. Finally, this also removes the need for using a regular expression on e.g. all non-ASCII text.	2022-01-16 08:29:21 +01:00

1 2 3 4 5 ...

2527 Commits