Commit Graph

5374 Commits

Author SHA1 Message Date
Jonas Jenwald
99cd24ce3e Remove the isString helper function
The call-sites are replaced by direct `typeof`-checks instead, which removes unnecessary function calls. Note that in the `src/`-folder we already had more `typeof`-cases than `isString`-calls.
2022-02-26 16:33:41 +01:00
Jonas Jenwald
6bd4e0f5af Re-factor the PDFDocument.documentInfo method
This removes the `DocumentInfoValidators` structure, and thus (slightly) simplifies the code overall. With these changes we only have to iterate through, and validate, the actually available Dictionary entries.
2022-02-26 16:33:21 +01:00
Tim van der Meij
f782f5e5bb
Merge pull request #14607 from Snuffleupagus/wrapReason-unreachable
Simplify the `wrapReason` helper function
2022-02-26 15:37:29 +01:00
Tim van der Meij
cf7ce0aa7e
Merge pull request #14600 from Snuffleupagus/getPageIndex-more-validation
[api-minor] Add validation for the  `PDFDocumentProxy.getPageIndex` method
2022-02-26 15:30:00 +01:00
Jeff Muizelaar
9b9609a6d8 [JPEG 2000] Add support for resetContextProbabilities (bug 1731483) 2022-02-26 13:05:23 +01:00
Calixte Denizet
46369e4aa5 Fix some issues with lineWidth < 1 after transform (bug 1753075, bug 1743245, bug 1710019)
- it aims to fix:
   - https://bugzilla.mozilla.org/show_bug.cgi?id=1753075;
   - https://bugzilla.mozilla.org/show_bug.cgi?id=1743245;
   - https://bugzilla.mozilla.org/show_bug.cgi?id=1710019;
   - issue #13211;
   - issue #14521.
 - previously we were trying to adjust lineWidth to have something correct after the current transform is applied but this approach was not correct because finally the pixel is rescaled with the same factors in both directions.
  And sometimes those factors must be different (see bug 1753075).
 - So the idea of this patch is to apply a scale matrix to the current transform just before setting lineWidth and stroking. This scale matrix is computed in order to ensure that after transform, a pixel will have its two thickness greater than 1.
2022-02-25 18:37:34 +01:00
Jonas Jenwald
28fc8248f0 Simplify the wrapReason helper function
All call-sites that use `wrapReason` should be passing a (possibly cloned) `Error` to the helper function, hence we shouldn't need to have a fallback code-path for any other data.
Note that for the `cancel`/`error` methods on Streams, since PR 11115 we've been asserting that the argument is in fact an `Error` as intended.
When calling `wrapReason` from *rejected* Promises, we should also be guaranteed that an `Error` is provided thanks to the ESLint rules `no-throw-literal` and `prefer-promise-reject-errors`.
2022-02-25 18:31:12 +01:00
Jonas Jenwald
172d007598 [api-minor] Add validation for the PDFDocumentProxy.getPageIndex method
Currently we'll happily attempt to send any argument passed to this method over to the worker-thread, without doing any sort of validation.
That could obviously be quite bad, since there's first of all no protection against sending unclonable data. Secondly, it's also possible to pass data that will cause the `Ref.get` call in the worker-thread to fail immediately.

In order to address all of these issues, we'll now properly validate the argument passed to `PDFDocumentProxy.getPageIndex` and when necessary reject already on the main-thread instead.
2022-02-24 12:01:51 +01:00
Jonas Jenwald
2be8036eb7 [api-minor] Reduce duplication in the "gets non-existent page" unit-test 2022-02-24 11:25:21 +01:00
Jonas Jenwald
ec87995050 Ensure that Cmd/Name is only initialized with string arguments
Trying to use a non-string argument in either a `Cmd` or a `Name` is not intended, and would basically be an implementation error. Hence we can add a non-PRODUCTION check to enforce this, similar to the existing one used e.g. in the `Dict.set` method.
2022-02-23 22:39:12 +01:00
Tim van der Meij
2bb96a708c
Merge pull request #14598 from Snuffleupagus/rm-isBool
Re-factor the `Catalog.viewerPreferences` method and remove the `isBool` helper function
2022-02-23 20:36:56 +01:00
Tim van der Meij
409cbfc817
Merge pull request #14597 from Snuffleupagus/Dict-set-validate-key
Ensure that `Dict.set` only accepts string `key`s
2022-02-23 20:31:36 +01:00
Tim van der Meij
1b51e10c9c
Merge pull request #14595 from Snuffleupagus/structuredClone-comment-support
Update the support information for `structuredClone` (PR 14392 follow-up)
2022-02-23 20:27:35 +01:00
Jonas Jenwald
3704283f5b Remove the isBool helper function
The call-sites are replaced by direct `typeof`-checks instead, which removes unnecessary function calls.
2022-02-23 13:31:03 +01:00
Jonas Jenwald
82f1ee1755 Re-factor the Catalog.viewerPreferences method
This removes the `ViewerPreferencesValidators` structure, and thus (slightly) simplifies the code overall. With these changes we only have to iterate through, and validate, the actually available Dictionary entries.
2022-02-23 13:25:56 +01:00
Jonas Jenwald
a2f9031e9a Ensure that Dict.set only accepts string keys
Trying to use a non-string `key` in a `Dict` is not intended, and would basically be an implementation error. Hence we can add a non-PRODUCTION check to enforce this, complementing the existing `value` check added in PR 11672.
2022-02-22 16:35:20 +01:00
Jonas Jenwald
48985bd221 Update the support information for structuredClone (PR 14392 follow-up)
When the `structuredClone` polyfill was added, the support information in Safari was unclear. Given that an actual version *number* is now available, see below, it seems like a good idea to update the comment accordingly.

https://developer.mozilla.org/en-US/docs/Web/API/structuredClone#browser_compatibility
2022-02-22 12:30:54 +01:00
Jonas Jenwald
05edd91bdb Remove the isNum helper function
The call-sites are replaced by direct `typeof`-checks instead, which removes unnecessary function calls. Note that in the `src/`-folder we already had more `typeof`-cases than `isNum`-calls.

These changes were *mostly* done using regular expression search-and-replace, with two exceptions:
 - In `Font._charToGlyph` we no longer unconditionally update the `width`, since that seems completely unnecessary.
 - In `PDFDocument.documentInfo`, when parsing custom entries, we now do the `typeof`-check once.
2022-02-22 11:55:34 +01:00
Jonas Jenwald
b282814e38 Prefer instanceof Name rather than calling isName() with one argument
Unless you actually need to check that something is both a `Name` and also of the *correct* type, using `instanceof Name` directly should be a tiny bit more efficient since it avoids one function call and an unnecessary `undefined` check.

This patch uses ESLint to enforce this, since we obviously still want to keep the `isName` helper function for where it makes sense.
2022-02-21 12:45:00 +01:00
Jonas Jenwald
4df82ad31e Prefer instanceof Dict rather than calling isDict() with one argument
Unless you actually need to check that something is both a `Dict` and also of the *correct* type, using `instanceof Dict` directly should be a tiny bit more efficient since it avoids one function call and an unnecessary `undefined` check.

This patch uses ESLint to enforce this, since we obviously still want to keep the `isDict` helper function for where it makes sense.
2022-02-21 12:44:56 +01:00
Jonas Jenwald
67b658e8d5 Prefer instanceof Cmd rather than calling isCmd() with *one* argument
Unless you actually need to check that something is both a `Cmd` and also of the *correct* type, using `instanceof Cmd` directly should be a tiny bit more efficient since it avoids one function call and an unnecessary `undefined` check.

This patch uses ESLint to enforce this, since we obviously still want to keep the `isCmd` helper function for where it makes sense.
2022-02-21 12:44:51 +01:00
Jonas Jenwald
bad15894fc Improve the JSDocs for the PDFObjects class
Given that we expose `PDFObjects`-instances, via the `commonObjs` and `objs` properties, on the `PDFPageProxy`-instances this ought to help provide slightly better TypeScript definitions.
2022-02-20 13:02:14 +01:00
Jonas Jenwald
f4712bc0ad Simplify the data stored on PDFObjects-instances
The manually tracked `resolved`-property is no longer necessary, since the same information is now directly available on all `PromiseCapability`-instances.
Furthermore, since the `PDFObjects.resolve` method is not documented as accepting e.g. only Object-data, we probably shouldn't resolve the `PromiseCapability` with the `data` and instead only store it on the `PDFObjects`-instance.[1]

---
[1] While Objects are passed by reference in JavaScript, other primitives such as e.g. strings are passed by value and the current implementation *could* thus lead to increased memory usage. Given how we're using `PDFObjects` in the PDF.js code-base none of this should be an issue, but it still cannot hurt to change this.
2022-02-20 12:33:33 +01:00
Jonas Jenwald
beecde3229 Introduce (some) private properties/methods in the PDFObjects class
This ensures that the underlying data cannot be accessed directly, from the outside, since that's definately not intended here.
Note that we expose `PDFObjects`-instances, via the `commonObjs` and `objs` properties, on the `PDFPageProxy`-instances hence these changes really cannot hurt.
2022-02-20 12:23:30 +01:00
Jonas Jenwald
2cb2f633ac Remove the isRef helper function
This helper function is not really needed, since it's just a wrapper around a simple `instanceof` check, and it only adds unnecessary indirection in the code.
2022-02-19 15:33:42 +01:00
Tim van der Meij
df0aa1a9c4
Merge pull request #14575 from Snuffleupagus/rm-isStream
Remove the `isStream` helper function
2022-02-19 14:59:19 +01:00
Jonas Jenwald
05efe3017b Change PixelsPerInch to a class with static properties (issue 14579)
*Please note:* I'm completely fine with this patch being rejected, and the issue instead closed as WONTFIX, since this is unfortunately a case where the TypeScript definitions dictate how we can/cannot write JavaScript code.

Apparently the TypeScript definitions generation converts the existing `PixelsPerInch` code into a `namespace` and simply ignores the getter; please see a7fc0d33a1/types/src/display/display_utils.d.ts (L223-L226)

Initially I tried tagging `PixelsPerInch` as en `@enum`, see https://jsdoc.app/tags-enum.html, however that unfortunately didn't help.
Hence the only good/simple solution, as far as I'm concerned, is to convert `PixelsPerInch` into a class with `static` properties. This patch results in the following diff, for the `gulp types` build target:
```diff
@@ -195,9 +195,10 @@
      */
     static toDateObject(input: string): Date | null;
 }
-export namespace PixelsPerInch {
-    const CSS: number;
-    const PDF: number;
+export class PixelsPerInch {
+    static CSS: number;
+    static PDF: number;
+    static PDF_TO_CSS_UNITS: number;
 }
 declare const RenderingCancelledException_base: any;
 export class RenderingCancelledException extends RenderingCancelledException_base {
```
2022-02-19 09:05:40 +01:00
Jonas Jenwald
530af48b8e
Merge pull request #14569 from brendandahl/smask-state
Fix canvas state getting out of sync from smasks. (bug 1755507)
2022-02-18 19:35:58 +01:00
Brendan Dahl
7def6d12c8 Fix canvas state getting out of sync from smasks. (bug 1755507)
Soft masks can be enabled/disabled at anytime and at different
points in the save/restore stack. This can lead to
the amount of save/restores becoming unbalanced across the
two canvases. Instead of save/restoring on the temporary canvas
change it so we only track state on the main (suspended canvas).

I was also getting an out balance stack from patterns, so I've also
fixed that and added a warning that will at least show up on chrome.
It would be nice to add this so Firefox at some point too.

Fixes #11328, #14297 and bug 1755507
2022-02-17 17:38:32 -08:00
Jonas Jenwald
1a31855977 Remove the isStream helper function
At this point all the various Stream-classes extends an abstract base-class, hence this helper function is no longer necessary and only adds unnecessary indirection in the code.
2022-02-17 13:51:36 +01:00
Jonas Jenwald
fd319e94b3 Add a missing string-check in the _collectJS helper function
Unfortunately I don't have a test-case that breaks without this change, however the `stringToPDFString` helper function will fail if anything other than a string is passed to it.
The changes in this patch thus make this code more-or-less identical to that found in the `Catalog.{_collectJavaScript, parseDestDictionary}` methods.
2022-02-16 13:43:42 +01:00
Calixte Denizet
18e3a98c2b [api-minor] Don't add in the text content the chars which are out-of-page (bug 1755201)
- it aims to fix https://bugzilla.mozilla.org/show_bug.cgi?id=1755201;
- if the glyph position is not within the view then skip it.
2022-02-13 21:07:11 +01:00
Tim van der Meij
c37d785b2a
Merge pull request #14560 from Snuffleupagus/Node-ReadableStream-polyfill
[api-minor] Remove the, in `legacy` builds, bundled `ReadableStream` polyfill
2022-02-13 14:08:22 +01:00
Jonas Jenwald
b89595fd20 [api-minor] Remove the, in legacy builds, bundled ReadableStream polyfill
According to the MDN compatibility data, see https://developer.mozilla.org/en-US/docs/Web/API/ReadableStream#browser_compatibility, all browsers that we support have native `ReadableStream` implementations (since quite some time too).

Hence only Node.js is now lagging behind w.r.t. `ReadableStream` support, and its experimental implementation doesn't really help us given the life-span of the LTS releases (see https://en.wikipedia.org/wiki/Node.js#Releases).
It seems quite unfortunate to bundle a `ReadableStream` polyfill in the `legacy` builds when it's unnecessary in browsers, given its overall size, but fortunately we can avoid that by simply listing `web-streams-polyfill` as a dependency for the `pdfjs-dist` library.
2022-02-13 10:15:58 +01:00
Jonas Jenwald
d642d34500 Remove the UTF-8 fallback, when TextDecoder is missing, from the Content-Disposition parser
Given that `TextDecoder` is now supported by all modern browsers/environments, please see https://developer.mozilla.org/en-US/docs/Web/API/TextDecoder#browser_compatibility, there's no longer any good reason to keep a UTF-8 fallback in the Content-Disposition parser.
2022-02-12 10:30:25 +01:00
Jonas Jenwald
b87a243222 [api-minor] Stop exposing the createObjectURL helper function in the API
With recent changes, specifically PR 14515 *and* the previous patch, the `createObjectURL` helper function is now only used with the SVG back-end.
All other call-sites, throughout the code-base, are now using `URL.createObjectURL(...)` directly and it no longer seems necessary to keep exposing the helper function in the API.
Finally, the `createObjectURL` helper function is moved into the `src/display/svg.js` file to avoid unnecessarily duplicating this code on both the main- and worker-threads.
2022-02-10 12:01:35 +01:00
Brendan Dahl
f8b2a99ddc
Merge pull request #14543 from Snuffleupagus/bug-1753983
Let `Lexer.getNumber` treat a single minus sign as zero (bug 1753983)
2022-02-09 14:06:35 -08:00
Jonas Jenwald
1f0fb270b1 [api-minor] Ensure that the PDFDocumentLoadingTask-promise is rejected when cancelling the PasswordPrompt (bug 1754421)
This is essentially a *continuation* of PR 7926, where we added support for rejecting the current `PDFDocumentLoadingTask`-promise by throwing inside of the `onPassword`-callback.
Hence the naive way to address [bug 1754421](https://bugzilla.mozilla.org/show_bug.cgi?id=1754421) would be to simply throw in the `onPassword`-callback used in the default viewer. However it unfortunately turns out to not work, since the password input/validation is asynchronous, and we thus need another approach.

The simplest solution that I can come up with here, is thus to *extend* the `onPassword`-callback to also reject the current `PDFDocumentLoadingTask`-instance if an `Error` is explicitly passed as the input to the callback function. (This doesn't feel great, but I cannot see a better solution that isn't really complicated.)
2022-02-09 15:09:20 +01:00
Jonas Jenwald
64f3dbeb48 Let Lexer.getNumber treat a single minus sign as zero (bug 1753983)
This appears to be consistent with the behaviour in both Adobe Reader and PDFium (in Google Chrome); this is essentially the same approach as used for a single decimal point in PR 9827.
2022-02-07 17:09:47 +01:00
Jonas Jenwald
03f5f6a421 [api-minor] Update the minimum supported browser versions
Please note that while we "support" some (by now) fairly old browsers, that essentially means that the library (and viewer) will load and that the basic functionality will work as intended.[1]
However, in older browsers, some functionality may not be available and generally we'll ask users to update to a modern browser when bugs (specific to old browsers) are reported.[2]

There's always a question of just how old browsers the PDF.js contributors can realistically support, and here I'm suggesting that we place the cut-off point at approximately *three* years.
With that in mind, this patch updates the *minimum* supported browsers (and environments) as follows:
 - Chrome 73, which was released on 2019-03-12; see https://en.wikipedia.org/wiki/Google_Chrome_version_history
 - Firefox ESR (as before); see https://wiki.mozilla.org/Release_Management/Calendar
 - Safari 12.1, which was released on 2019-03-25; see https://en.wikipedia.org/wiki/Safari_version_history#Safari_12
 - Node.js 12, which was release on 2019-04-23 (and will soon reach EOL); see https://en.wikipedia.org/wiki/Node.js#Releases

---
[1] Assuming a `legacy`-build is being used, of course.

[2] In general it's never a good idea to use an old/outdated browser, since those may contain *known* security vulnerabilities.
2022-02-06 13:06:43 +01:00
Jonas Jenwald
403baa7bba [api-minor] Remove the normalizeWhitespace option in the PDFPageProxy.{getTextContent, streamTextContent} methods (issue 14519, PR 14428 follow-up)
With these changes, we'll now *always* replace all whitespaces with standard spaces (0x20). This behaviour is already, since many years, the default in both the viewer and the browser-tests.
2022-02-03 09:17:22 +01:00
calixteman
7a034706ba
Merge pull request #14510 from calixteman/14502
[api-minor] Annotations - Adjust the font size in text field in considering the total width (bug 1721335)
2022-01-30 15:58:51 +01:00
Calixte Denizet
ae842e1c3a [api-minor] Annotations - Adjust the font size in text field in considering the total width (bug 1721335)
- it aims to fix #14502 and bug 1721335;
 - Acrobat and Pdfium do the same;
 - it'll avoid to have truncated data when printed;
 - change the factor to compute font size in using field height: lineHeight = 1.35*fontSize
  - this is the value used by Acrobat.
 - in order to not have truncated strings on the bottom, add few basic metrics for standard fonts.
2022-01-30 15:53:31 +01:00
Jonas Jenwald
7cc761a8c0 Polyfill structuredClone with core-js (PR 13948 follow-up)
This allows us to remove the manually implemented `structuredClone` polyfill, thus reducing the maintenance burden for the `LoopbackPort` class; refer to https://github.com/zloirock/core-js#structuredclone

*Please note:* While `structuredClone` support landed already in Firefox 94, Google Chrome only added it in version 98 (currently in Beta). However, given that the `LoopbackPort` will only be used together with *fake workers* in browsers this shouldn't be too much of a problem.[1]
For Node.js environments, where *fake workers* are unfortunately necessary, using a `legacy/`-build is already required which thus guarantees that the `structuredClone` polyfill is available.

Also, the patch updates core-js to the latest version since that one includes `structuredClone` improvements; please see https://github.com/zloirock/core-js/releases/tag/v3.20.3

---
[1] Given that we only support browsers with proper worker support, if *fake workers* are being used that essentially indicates a configuration problem/error.
2022-01-27 21:11:42 +01:00
Jonas Jenwald
8f6965b197
Merge pull request #14506 from Snuffleupagus/license_header_2022
Update the year in the `license_header` files
2022-01-27 19:34:56 +01:00
Jonas Jenwald
00bd549e82 Update the year in the license_header files
This also includes a couple of files that are included as-is in the `pdfjs-dist` library.
2022-01-27 19:24:31 +01:00
calixteman
838909f8c1
Merge pull request #14491 from quaoaris/lines-rendered-too-thick
fix for lines (stroke) are rendered too thick  (Bug 1743245)
2022-01-27 18:46:26 +01:00
Calixte Denizet
3a7004ca25 Take into account all rotations before comparing glyph positions
- it aims to fix #14497;
 - previously, only rotations with an angle 0, 90, 180 or 270 were taken into account;
 - so generalize to any angle but keep the fast path for 0, 90, ... because they're likely more common than anything else.
2022-01-26 17:19:00 +01:00
quaoaris
3f77d80f31 fix for lines (stroke) are rendered too thick (Bug 1743245)
This commit fixes Bug 1743245 (Grided PDF file lines rendered too thick) which was created by a fix for  #12868 .
The lineWidth was set to round(1 * this._combinedScaleFactor) when the pixel is drawn as a parallelorgam with a height <1. This fix changes this to floor(1*this._combinedScaleFactor) .

This change shows a visual result comparable to Chrome and Acrobat.
Regarding the last PR 3 statements in canvas.js are affected and will change with this commit (stroke and paintChar).

renaming the reference files to naming comvention
2022-01-25 10:27:30 +01:00
Jonas Jenwald
8836593b9e Add a (global) cache to the getCharUnicodeCategory function
Given that the regular expression has already become more complex (after the initial patch adding it), it seems to me that it probably cannot hurt to add a global cache to reduce unnecessary re-parsing.
Obviously the `Glyph`-instances are being cached *per* font, however in most documents multiple fonts are being used and in practice there's very often a fair amount of overlap between the /ToUnicode-data in different fonts[1].

Consider for example loading and rendering the entire `tracemonkey.pdf` document (from the test-suite), which isn't a particularily large document. In that case the `getCharUnicodeCategory` function is being called a total of `601` times, however there's only `106` *unique* unicode-chars being checked.

*Please note:* In practice I suppose that this won't have a *huge* effect on overall performance, however given the relative simplicity of this patch I figured that it'd not hurt to submit it for review.

---
[1] Consider e.g. how there's usually different fonts used for regular, bold, respectively italic text.
2022-01-25 09:59:34 +01:00
Calixte Denizet
e1d3a3b414 Remove the invisible format marks from the text chunks
- it aims to fix issue #9186.
2022-01-24 13:47:24 +01:00
calixteman
88236e1163
Merge pull request #14430 from calixteman/beforeinput
[JS] Use beforeinput event to trigger a keystroke event in the sandbox
2022-01-23 20:42:33 +01:00
Calixte Denizet
6ac296e48e [JS] Use beforeinput event to trigger a keystroke event in the sandbox
- it aims to fix issue #14307;
 - this event has been added recently in Firefox and we can now use it;
 - fix few bugs in aform.js or in annotation_layer.js;
 - add some integration tests to test keystroke events (see `AFSpecial_Keystroke`);
 - make dispatchEvent in the quickjs sandbox async.
2022-01-23 19:53:01 +01:00
Tim van der Meij
23b6fde9fc
Merge pull request #14464 from Snuffleupagus/issue-14462
Support Type1 font files with incomplete /CharStrings definitions (issue 14462)
2022-01-19 20:38:46 +01:00
calixteman
b0231cc887
Merge pull request #14456 from calixteman/1749563
Font renderer - get int8 instead of uint8 in composite glyphes (bug 1749563)
2022-01-19 01:20:49 -08:00
Calixte Denizet
74f25d2755 Font renderer - get int8 instead of uint8 in composite glyphes (bug 1749563)
- it aims to fix https://bugzilla.mozilla.org/show_bug.cgi?id=1749563;
 - use some helper functions to get (u|i)int** values in buffer: it helps to have a clearer code;
 - in composite glyphes the translations values with a transformations are signed so consequently get some int8 instead of uint8;
 - add few TODOs.
2022-01-18 22:06:23 +01:00
Jonas Jenwald
a13ae5d97d Support Type1 font files with incomplete /CharStrings definitions (issue 14462)
Please refer to https://www.pdfa.org/norm-refs/Type1Fonts.pdf#page=15 for the expected format for the /CharStrings entries.
In the referenced PDF document the /CharStrings are missing the expected end-token, which causes us to swallow the start of the next glyph name.
2022-01-17 18:55:22 +01:00
Jonas Jenwald
ba37d600d7 Make the normalizeWhitespace handling, in the PartialEvaluator, more efficient (PR 14428 follow-up)
After the changes in PR 14428 we can *directly*, and more efficiently, handle whitespace conversion in `PartialEvaluator.getTextContent` when the `normalizeWhitespace` option is being used.
This way we no longer need a separate helper function for this, and can avoid having to (again) iterate through the text and checking each character. Finally, this also removes the need for using a regular expression on e.g. all non-ASCII text.
2022-01-16 08:29:21 +01:00
calixteman
da953f4b64
Merge pull request #14428 from calixteman/typo
Use the correct dimension to know if we have to add an EOL in vertical mode
2022-01-15 12:47:10 -08:00
Calixte Denizet
9dae421a0d Handle all the whitespaces the same way when creating text chunks 2022-01-15 21:44:00 +01:00
Tim van der Meij
922dac035c
Merge pull request #14448 from Snuffleupagus/Type3-circular-refs
Prevent circular references in Type3 fonts
2022-01-15 14:11:47 +01:00
Tim van der Meij
a72d188599
Merge pull request #14439 from Snuffleupagus/issue-14438
Ignore Annotations with empty /Rect-entries in the display-layer (issue 14438)
2022-01-15 14:11:25 +01:00
Tim van der Meij
c0d2932faf
Merge pull request #14454 from Snuffleupagus/util-more-unreachable
Replace some `assert` usage with `unreachable` in the `src/shared/util.js` file
2022-01-15 13:52:10 +01:00
Tim van der Meij
625f829842
Merge pull request #14446 from Snuffleupagus/issue-14435
Expose even more API-functionality in the TypeScript definitions (issue 14435, PR 14013 follow-up)
2022-01-15 13:46:11 +01:00
Jonas Jenwald
0e1b93bf20 Replace some assert usage with unreachable in the src/shared/util.js file
Inlining the checks should be a *tiny bit* more efficient, since it avoids have to make *unconditional* function calls in these fairly commonly used helper functions.
2022-01-15 13:01:25 +01:00
Jonas Jenwald
12d8f0b64d Re-factor the stringToPDFString helper function for UTF-16 strings
This patch changes the function to instead utilize the `TextDecoder` for both kinds of UTF-16 BOM strings.
2022-01-14 20:38:40 +01:00
Jonas Jenwald
76444888fb Add (basic) UTF-8 support in the stringToPDFString helper function (issue 14449)
This patch implements this by looking for the UTF-8 BOM, i.e. `\xEF\xBB\xBF`, in order to determine the encoding.[1]
The actual conversion is done using the `TextDecoder` interface, which should be available in all environments/browsers that we support; please see https://developer.mozilla.org/en-US/docs/Web/API/TextDecoder#browser_compatibility

---
[1] Assuming that everything lacking a UTF-16 BOM would have to be UTF-8 encoded really doesn't seem correct.
2022-01-14 18:57:07 +01:00
Jonas Jenwald
53d4ee7990 Prevent circular references in Type3 fonts
In corrupt PDF documents Type3 fonts may introduce circular dependencies, thus resulting in the affected font(s) never loading and parsing/rendering never completing.
Note that I've not seen any real-world examples of this kind of font corruption, but the attached PDF document was rather found in https://github.com/pdf-association/safedocs/tree/main/Miscellaneous%20Targeted%20Test%20PDFs

*Please note:* That repository contains a number of reduced test-cases that are specifically intended to test interoperability (between PDF viewer) and parsing/rendering for various kinds of strange/corrupt PDF documents.
Some of the test-cases found there may thus not make sense to try and "fix" upfront, in my opinion, unless the problems are also found in real-world PDF documents.
2022-01-13 17:58:37 +01:00
Jonas Jenwald
b9849e38b8 Expose even more API-functionality in the TypeScript definitions (issue 14435, PR 14013 follow-up)
While `PageViewport` apparently makes sense in TypeScript environments, given that it's being returned by the `PDFPageProxy.getViewport`-method in the API, we really don't want to extend the *public* API by simply exporting the class directly in `src/pdf.js` since it should never be called/initialized manually.
Hence we follow the same pattern as in PR 14013, and also extend the API unit-tests to ensure that `PDFPageProxy.getViewport` always returns a `PageViewport`-instance as expected.
2022-01-13 12:05:40 +01:00
Jonas Jenwald
08d88a0235 Ignore Annotations with empty /Rect-entries in the display-layer (issue 14438)
This prevents the `BaseSVGFactory.create`-method from throwing, and thus preventing any remaining Annotations (on the page) from rendering in corrupt documents.
2022-01-11 13:54:35 +01:00
Tim van der Meij
8ac0ccc227
Merge pull request #14424 from Snuffleupagus/mv-addLinkAttributes
[api-minor] Move `addLinkAttributes`, `LinkTarget`, and `removeNullCharacters` into the viewer (PR 14092 follow-up)
2022-01-08 13:19:11 +01:00
Calixte Denizet
6369617e6f [JS] Fix few errors around AFSpecial_Keystroke
- @cincodenada found some errors which are fixed in this patch;
 - it partially fixes issue #14306;
 - add some tests.
2022-01-08 12:34:56 +01:00
Calixte Denizet
9bb636402a Use the correct dimension to know if we have to add an EOL in vertical mode 2022-01-07 15:19:03 +01:00
Jonas Jenwald
7b8794b37e [api-minor] Move removeNullCharacters into the viewer
This helper function has never been used in e.g. the worker-thread, hence its placement in `src/shared/util.js` led to a *small* amount of unnecessary duplication.
After the previous patches this helper function is now *only* used in the viewer, hence it no longer seems necessary to expose it through the official API.

*Please note:* It seems somewhat unlikely that third-party users were relying *directly* on this helper function, which is why it's not being exported as part of the viewer components. (If necessary, we can always change this later on.)
2022-01-06 12:25:33 +01:00
Jonas Jenwald
2d2b6463b8 [api-minor] Move addLinkAttributes and LinkTarget into the viewer
As part of the changes/improvement in PR 14092, we're no longer using the `addLinkAttributes` directly in e.g. the AnnotationLayer-code.
Given that the helper function is now *only* used in the viewer, hence it no longer seems necessary to expose it through the official API.

*Please note:* It seems somewhat unlikely that third-party users were relying *directly* on the helper function, which is why it's not being exported as part of the viewer components. (If necessary, we can always change this later on.)
2022-01-06 12:25:33 +01:00
Calixte Denizet
6cdae5ac4d Use positive dimensions for text chunks in the text layer (issue #14415). 2022-01-05 10:49:56 +01:00
Jonas Jenwald
b0e774d9c5 Convert Catalog.getAllPageDicts to an async method
The patch in PR 14335 *essentially* re-introduced the old code from before PR 3848, however looking at this code a bit closer it should be possible to simplify it by making the method asynchronous.

While this method is currently only used as a *fallback* in corrupt documents, the way that `MissingDataException`s are handled is less than ideal. Note that if a `MissingDataException` is thrown, we're forced to re-parse the *entire* /Pages tree[1].
With this method now being asynchronous, we're able to handle fetching of References in a *much* easier/nicer way than before without having to throw `MissingDataException`s and re-parse anything.
These changes also let us simplify the call-site slightly, by calling the method *directly* instead of using the `PDFManager`-instance (since again it will no longer throw `MissingDataException`s).

Furthermore, this patch contains the following other changes:
 - Reduce unnecessary duplication in the various `catch` handlers throughout the method, by simply moving the `XRefEntryException` handling into the `addPageError` helper function instead.
 - Move the "circular references"-check to occur slightly earlier, since there's obviously no point in asynchronously fetching data just to then throw an Error *immediately* afterwards.

---
[1] Imagine e.g. a thousand page document, where there's a `MissingDataException` thrown when fetching/parsing page 900.
2021-12-31 22:03:10 +01:00
Jonas Jenwald
1491459dea Improve caching for the Catalog.getPageIndex method (PR 13319 follow-up)
This method is now being used a lot more, compared to when it's added, since it's now used together with scripting as part of the `PDFDocument.fieldObjects` parsing (called during viewer initialization).
For /Page Dictionaries that we've already parsed, the `pageIndex` corresponding to a particular Reference is already known and we're thus able to skip *all* parsing in the `Catalog.getPageIndex` method for those cases.
2021-12-29 20:29:14 +01:00
Jonas Jenwald
a20393e6e4 Update PDFDocument._getLinearizationPage to do the /Type-check correctly (PR 14400 follow-up)
I forgot about this in PR 14400, since we should obviously be consistent *and* given that the existing check is actually wrong; sorry about this!
2021-12-29 13:26:58 +01:00
Tim van der Meij
e42d54e1b5
Merge pull request #14400 from Snuffleupagus/getPageDict-async
[api-minor] Convert `Catalog.getPageDict` to an asynchronous method
2021-12-28 19:40:34 +01:00
Jonas Jenwald
b513c64d9d [api-minor] Convert Catalog.getPageDict to an asynchronous method
Besides converting `Catalog.getPageDict` to an `async` method, thus simplifying the code, this patch also allows us to pro-actively fix a existing issue.
Note how we're looking up References in such a way that `MissingDataException`s won't cause trouble, however it's *technically possible* that the entries (i.e. /Count, /Kids, and /Type) in a /Pages Dictionary could actually be indirect objects as well. In the existing code this could lead to *some*, or even all, pages failing to load/render as intended.
In practice that doesn't *appear* to happen in real-world PDF documents, but given all the weird things that PDF software do I'd prefer to fix this pro-actively (rather than waiting for a bug report).
With `Catalog.getPageDict` being `async` this is now really simple to address, however I didn't want to introduce a bunch more *unconditional* asynchronicity in this method if it could be avoided (since that could slow things down). Hence we'll *synchronously* lookup the *raw* data in a /Pages Dictionary, and only fallback to asynchronous data lookup when a Reference was encountered.

In addition to the above, this patch also makes the following notable changes:
 - Let `Catalog.getPageDict` *consistently* reject with the actual error, regardless of what data we're fetching. Previously we'd "swallow" the actual errors except when looking up Dictionary entries, which is inconsistent and thus seem unfortunate. As can be seen from the updated unit-tests this change is API-observable, hence why the patch is tagged `[api-minor]`.

 - Improve the consistency of the Dictionary /Type-checks in both the `Catalog.getPageDict` and `Catalog.getAllPageDicts` methods.
   In `Catalog.getPageDict` there's a fallback code-path where we're *incorrectly* checking the /Page Dictionary for a /Contents-entry, which is wrong since a /Page Dictionary doesn't need to have a /Contents-entry in order to be valid.
   For consistency the `Catalog.getAllPageDicts` method is also updated to handle errors in the /Type-lookup correctly.

 - Reduce the `PagesCountLimit.PAUSE_EAGER_PAGE_INIT` viewer constant, to further improve loading/rendering performance of the *second* page during initialization of very long documents; PR 14359 follow-up.
2021-12-25 15:22:48 +01:00
KouWakai
98158b67a3 Handle non-integer Annotation border widths correctly (issue 14203)
The existing code appears to be wrong, since according to the PDF specification the border width of an Annotation only has to be a number and not specifically an integer. Please see:
 - https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf#page=392
 - https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf#G11.2096210
 - https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf#G6.1965562
2021-12-24 22:10:19 +09:00
Jonas Jenwald
e0dba504d2 Fix broken/missing JSDocs and typedefs, to allow updating TypeScript to the latest version (issue 14342)
This patch circumvents the issues seen when trying to update TypeScript to version `4.5`, by "simply" fixing the broken/missing JSDocs and `typedef`s such that `gulp typestest` now passes.
As always, given that I don't really know anything about TypeScript, I cannot tell if this is a "correct" and/or proper way of doing things; we'll need TypeScript users to help out with testing!

*Please note:* I'm sorry about the size of this patch, but given how intertwined all of this unfortunately is it just didn't seem easy to split this into smaller parts.
However, one good thing about this TypeScript update is that it helped uncover a number of pre-existing bugs in our JSDocs comments.
2021-12-15 23:14:25 +01:00
Tim van der Meij
d3e1d7090a
Merge pull request #14370 from Snuffleupagus/getPageDict-sync-Pages
Slightly reduce asynchronicity in the `Catalog.getPageDict` method (PR 14338 follow-up)
2021-12-15 19:40:39 +01:00
Jonas Jenwald
760f765e56 Move the /Lang handling into the BaseViewer (PR 14114 follow-up)
In PR 14114 this was only added to the default viewer, which means that in the viewer components the user would need to *manually* implement /Lang handling. This was (obviously) a bad choice, since the viewer components already support e.g. structTrees by default; sorry about overlooking this!

To avoid having to make *two* `getMetadata` API-calls[1] very early during initialization, in the default viewer, the API will now cache its result. This will also come in handy elsewhere in the default viewer, e.g. by reducing parsing when opening the "document properties" dialog.

---
[1] This not only includes a round-trip to the worker-thread, but also having to re-parse the /Metadata-entry when it exists.
2021-12-14 13:19:05 +01:00
Jonas Jenwald
fa51fd9428 Slightly reduce asynchronicity in the Catalog.getPageDict method (PR 14338 follow-up)
After the changes in PR 14338, specifically in the `XRef.parse`-method, the /Pages-entry will now always have been fetched/validated when the `Catalog`-instance is created.
Hence we can directly access the /Pages-entry in `Catalog.getPageDict` and thus avoid *one* asynchronous data-lookup per page in the document. (In practice this is unlikely to show up in e.g. benchmarks, but it really cannot hurt.)

Finally, make sure that the `getPageDict`/`getAllPageDicts`-methods track the /Pages-tree reference correctly to prevent circular references in corrupt documents.
2021-12-13 21:18:06 +01:00
Tim van der Meij
a6dd39b645
Merge pull request #14358 from Snuffleupagus/checkLastPage-improvements
Improve `PDFDocument.checkLastPage`/`Catalog.getAllPageDicts` for documents with corrupt XRef tables (PR 14311, 14335 follow-up)
2021-12-11 13:07:54 +01:00
Tim van der Meij
70809a80ce
Merge pull request #14355 from Snuffleupagus/api-page-caches-Map
Change `WorkerTransport.{pageCache, pagePromises}` from an Array to a Map
2021-12-11 13:00:11 +01:00
Jonas Jenwald
70ac6b1694 Update Catalog.getAllPageDicts to always propagate the actual Errors (PR 14335 follow-up)
Rather than "swallowing" the actual Errors, when data fetching fails, ensure that they're always being propagated as intended to the call-site instead.
Note that we purposely handle `XRefEntryException` specially, to make it possible to fallback to indexing all XRef objects.
2021-12-10 15:22:36 +01:00
Jonas Jenwald
47f9eef584 Improve PDFDocument.checkLastPage for documents with corrupt XRef tables (PR 14311, 14335 follow-up)
Rather than trying, and failing, to fetch the entire /Pages-tree for documents with corrupt XRef tables, let's fallback to indexing all objects *before* trying to invoke the `Catalog.getAllPageDicts` method.
2021-12-10 11:45:09 +01:00
Jonas Jenwald
f39536a30b Change WorkerTransport.pagePromises from an Array to a Map
Given that not all pages necessarily are being accessed, or that the pages may be accessed out of order, using a `Map` seems like a more appropriate data-structure here.

Finally, also changes the `pagePromises` to a *private* property since it's not supposed to be accessed from the "outside".
2021-12-09 15:30:10 +01:00
Jonas Jenwald
c5525dcb69 Change WorkerTransport.pageCache from an Array to a Map
Given that not all pages necessarily are being accessed, or that the pages may be accessed out of order, using a `Map` seems like a more appropriate data-structure here.
For one thing, this simplifies iteration since we no longer have to worry about/check if `pageCache`-entries are undefined (which will happen for *sparse* `Array`s).

Of particular note is that we're no longer attempting to "null" the `pageCache`-entry from within the `PDFPageProxy._destroy`-method. Given that *synchronous* JavaScript will always run to completion[1] and that we're looping through all pages in `WorkerTransport.destroy` and immediately clear the cache afterwards, that code did/does not really make a lot of sense (as far as I can tell).

Finally, also changes the `pageCache` to a *private* property since it's not supposed to be accessed from the "outside".

---
[1] Unless there are errors, of course.
2021-12-09 15:29:47 +01:00
Jonas Jenwald
8a05db230e Further improve caching in Catalog.getPageDict, for disableAutoFetch mode (PR 8207 follow-up)
PR 8207 added caching to improve the performance of `Catalog.getPageDict`, by not having to repeatedly fetch the same data and also reducing the asynchronicity of that method.
However, because of *another* oversight on my part, we're only caching /Page references once we've found the correct page. As long as all pages are loaded *in order* this doesn't really matter (happens by default in the viewer), but when `disableAutoFetch` is used the pages may be fetched in a more random order (this patch reduces the asynchronicity of `Catalog.getPageDict` slightly in that case).
2021-12-09 12:54:49 +01:00
Tim van der Meij
97dc048e56
Merge pull request #14350 from Snuffleupagus/ccitt-infinite-loop
Prevent an infinite loop when parsing corrupt /CCITTFaxDecode data (issue 14305)
2021-12-08 20:01:21 +01:00
Jonas Jenwald
e8562173b8 Prevent an infinite loop when parsing corrupt /CCITTFaxDecode data (issue 14305)
Fixes one of the documents in issue 14305.
2021-12-07 13:57:25 +01:00
Jonas Jenwald
5f295ba280 Improve caching in Catalog.getPageDict (PR 8207 follow-up)
PR 8207 added caching to improve the performance of `Catalog.getPageDict`, by not having to repeatedly fetch the same data and also reducing the asynchronicity of that method.
However, because of annoying off-by-one errors[1] the caching became less efficient than it could/should be.[2] Note here that the /Pages-tree is zero-indexed, and that e.g. `pageIndex = 5` thus correspond to the *sixth* page of the document.

---
[1] In particular the `currentPageIndex + count < pageIndex` part.

[2] For example, even when loading a relatively small/simple document such as `tracemonkey.pdf` in the viewer, the number of `xref.fetchAsync(currentNode)` calls are reduced from `56` to `44` with this patch.
2021-12-06 11:49:31 +01:00
Tim van der Meij
335c4c8a43
Merge pull request #14338 from Snuffleupagus/XRef-more-Pages-validation
[api-minor] Clear all caches in `XRef.indexObjects`, and improve /Root dictionary validation in `XRef.parse` (issue 14303)
2021-12-04 13:23:40 +01:00
Tim van der Meij
3117985c55
Merge pull request #14340 from Snuffleupagus/Metadata-fetch-error
Handle errors when fetching the raw /Metadata (issue 14305)
2021-12-04 13:19:37 +01:00
Jonas Jenwald
d9fac34596 Ensure that the shadow helper function is passed a valid property (PR 14152 follow-up)
Trying to shadow a non-existent property is always an implementation mistake, since it leads to the `shadow`-call not having any effect.

In PR 14152 I overlooked the fact that it's fairly easy to enforce this during development/testing, since that can help catch e.g. simple spelling bugs.
2021-12-04 10:07:21 +01:00
Jonas Jenwald
40291d1943 Handle errors when fetching the raw /Metadata (issue 14305)
Currently the `Catalog.metadata` getter only handles errors during parsing, however in a *corrupt* PDF document fetching of the raw /Metadata can obviously fail as well.
Without this patch the `PDFDocumentProxy.getMetadata` method, in the API, can thus fail which it *never* should and this will cause the viewer to not initialize all state as expected.

Fixes one of the documents in issue 14305.
2021-12-04 09:41:42 +01:00
Jonas Jenwald
ad3a271fc4 [api-minor] Clear all caches in XRef.indexObjects, and improve /Root dictionary validation in XRef.parse (issue 14303)
*This patch improves handling of a couple of PDF documents from issue 14303.*

 - Update `XRef.indexObjects` to actually clear *all* XRef-caches. Invalid XRef tables *usually* cause issues early enough during parsing that we've not populated the XRef-cache, however to prevent any issues we obviously need to clear that one as well.

 - Improve the /Root dictionary validation in `XRef.parse` (PR 9827 follow-up). In addition to checking that a /Pages entry exists, we'll now also check that it can be successfully fetched *and* that it's of the correct type. There's really no point trying to use a /Root dictionary that e.g. `Catalog.toplevelPagesDict` will reject, and this way we'll be able to fallback to indexing the objects in corrupt documents.

 - Throw an `InvalidPDFException`, rather than a general `FormatError`, in `XRef.parse` when no usable /Root dictionary could be found. That really seems more appropriate overall, since all attempts at parsing/recovery have failed. (This part of the patch is API-observable, hence the tag.)

With these changes, two existing test-cases are improved and the unit-tests are updated/re-factored to highlight that. In particular `GHOSTSCRIPT-698804-1-fuzzed.pdf` will now both load and "render" correctly, whereas `poppler-395-0-fuzzed.pdf` will now fail immediately upon loading (rather than *appearing* to work).
2021-12-03 11:57:38 +01:00
Jonas Jenwald
1fac6371d3 [Regression] Eagerly fetch/parse the entire /Pages-tree in corrupt documents (issue 14303, PR 14311 follow-up)
*Please note:* This is similar to the method that existed prior to PR 3848, but the new method will *only* be used as a fallback when parsing of corrupt PDF documents.

The implementation in PR 14311 unfortunately turned out to be *way* too simplistic, as evident by the recently added test-files in issue 14303, since it may *cause* infinite loops in `PDFDocument.checkLastPage` for some corrupt PDF documents.[1]
To avoid this, the easiest solution that I could come up with was to fallback to eagerly parsing the *entire* /Pages-tree when the /Count-entry validation fails during document initialization.

Fixes *at least* two of the issues listed in issue 14303, namely the `poppler-395-0.pdf...` and `GHOSTSCRIPT-698804-1.pdf...` documents.

---
[1] The whole point of PR 14311 was obviously to *get rid of* infinte loops during document initialization, not to introduce any more of those.
2021-12-02 14:31:04 +01:00
Jonas Jenwald
e045cd4520 Remove the unused skipCount parameter from Catalog.getPageDict (PR 14311 follow-up)
This was added in PR 14311, but given that I completely missed to update the `PDFDocument.getPage` signature accordingly it's completely unused.
Given that things work just as fine as-is, let's simply remove that optional parameter for now; sorry about the churn here!
2021-12-02 11:51:38 +01:00
Jonas Jenwald
63be23f05b Handle errors correctly when data lookup fails during /Pages-tree parsing (issue 14303)
This only applies to severely corrupt documents, where it's possible that the `Parser` throws when we try to access e.g. a /Kids-entry in the /Pages-tree.

Fixes two of the issues listed in issue 14303, namely the `poppler-742-0.pdf...` and `poppler-937-0.pdf...` documents.
2021-12-02 10:54:40 +01:00
Jonas Jenwald
a807ffe907 Prevent circular references in XRef tables from hanging the worker-thread (issue 14303)
*Please note:* While this patch on its own is sufficient to prevent the worker-thread from hanging, however in combination with PR 14311 these PDF documents will both load *and* render correctly.

Rather than focusing on the particular structure of these PDF documents, it seemed (at least to me) to make sense to try and prevent all circular references when fetching/looking-up data using the XRef table.
To avoid a solution that required tracking the references manually everywhere, the implementation settled on here instead handles that internally in the `XRef.fetch`-method. This should work, since that method *and* the `Parser`/`Lexer`-implementations are completely synchronous.

Note also that the existing `XRef`-caching, used for all data-types *except* Streams, should hopefully help to lessen the performance impact of these changes.
One *potential* problem with these changes could be certain *browser* exceptions, since those are generally not catchable in JavaScript code, however those would most likely "stop" worker-thread parsing anyway (at least I hope so).

Finally, note that I settled on returning dummy-data rather than throwing an exception. This was done to allow parsing, for the rest of the document, to continue such that *one* bad reference doesn't prevent an entire document from loading.

Fixes two of the issues listed in issue 14303, namely the `poppler-91414-0.zip-2.gz-53.pdf` and `poppler-91414-0.zip-2.gz-54.pdf` documents.
2021-11-27 23:50:26 +01:00
Jonas Jenwald
a669fce762 Inline the isDict, isRef, and isStream checks in the src/core/xref.js file 2021-11-27 23:49:17 +01:00
Jonas Jenwald
680e0efb9d Use Array-destructuring in the XRef.readXRefStream-method 2021-11-27 23:49:17 +01:00
Jonas Jenwald
d0c4bbd828 [api-minor] Validate the /Pages-tree /Count entry during document initialization (issue 14303)
*This patch basically extends the approach from PR 10392, by also checking the last page.*

Currently, in e.g. the `Catalog.numPages`-getter, we're simply assuming that if the /Pages-tree has an *integer* /Count entry it must also be correct/valid.
As can be seen in the referenced PDF documents, that entry may be completely bogus which causes general parsing to breaking down elsewhere in the worker-thread (and hanging the browser).

Rather than hoping that the /Count entry is correct, similar to all other data found in PDF documents, we obviously need to validate it. This turns out to be a little less straightforward than one would like, since the only way to do this (as far as I know) is to parse the *entire* /Pages-tree and essentially counting the pages.
To avoid doing that for all documents, this patch tries to take a short-cut by checking if the last page (based on the /Count entry) can be successfully fetched. If so, we assume that the /Count entry is correct and use it as-is, otherwise we'll iterate through (potentially) the *entire* /Pages-tree to determine the number of pages.

Unfortunately these changes will have a number of *somewhat* negative side-effects, please see a possibly incomplete list below, however I cannot see a better way to address this bug.
 - This will slow down initial loading/rendering of all documents, at least by some amount, since we now need to fetch/parse more of the /Pages-tree in order to be able to access the *last* page of the PDF documents.
 - For poorly generated PDF documents, where the entire /Pages-tree only has *one* level, we'll unfortunately need to fetch/parse the *entire* /Pages-tree to get to the last page. While there's a cache to help reduce repeated data lookups, this will affect initial loading/rendering of *some* long PDF documents,
 - This will affect the `disableAutoFetch = true` mode negatively, since we now need to fetch/parse more data during document initialization. While the `disableAutoFetch = true` mode should still be helpful in larger/longer PDF documents, for smaller ones the effect/usefulness may unfortunately be lost.

As one *small* additional bonus, we should now also be able to support opening PDF documents where the /Pages-tree /Count entry is completely invalid (e.g. contains a non-integer value).

Fixes two of the issues listed in issue 14303, namely the `poppler-67295-0.pdf` and `poppler-85140-0.pdf` documents.
2021-11-27 21:57:35 +01:00
Tim van der Meij
9a1e27efc5
Merge pull request #14313 from Snuffleupagus/PDFDocument_pagePromises-map
Change the `_pagePromises` cache, in the worker, from an Array to a Map
2021-11-27 20:58:23 +01:00
calixteman
bbd8b5ce9f
Merge pull request #14319 from calixteman/xfa_arc
XFA - Draw arcs correctly
2021-11-27 11:32:32 -08:00
Calixte Denizet
31e13515f5 XFA - Draw arcs correctly
- it aims to fix #14315;
- take into account the startAngle to compute the coordinates of the final point.
2021-11-27 19:30:12 +01:00
Calixte Denizet
cfdaa57353 Handle sub/super-scripts in rich text
- it aims to fix #14317;
 - change the fontSize and the verticalAlign properties according to the position of the text.
2021-11-27 16:06:09 +01:00
Jonas Jenwald
4c56214ab4 Convert PDFDocument._getLinearizationPage to an async method
This, ever so slightly, simplifies the code and reduces overall indentation.
2021-11-26 19:57:47 +01:00
Jonas Jenwald
080996ac68 Change the _pagePromises cache, in the worker, from an Array to a Map
Given that not all pages necessarily are being accessed, or that the pages may be accessed out of order, using a `Map` seems like a more appropriate data-structure here.
Furthermore, this patch also adds (currently missing) caching for XFA-documents. Loading a couple of such documents in the viewer, with logging added, shows that we're currently re-creating `Page`-instances unnecessarily for XFA-documents.
2021-11-26 19:53:57 +01:00
Jonas Jenwald
ca8d2bdce4 Abort parsing when the XRef /W-array contain bogus entries (issue 14303)
For this particular PDF document, we have `/W [1 2 166666666666666666666666666]` which obviously makes no sense.

While this patch makes no attempt at actually validating the entries in the /W-array, we'll now simply abort all processing when the end of the PDF document has been reached (thus preventing hanging the browser).
Please note that this patch doesn't enable the PDF document to be loaded/rendered, but at least it fails "correctly" now.

Fixes one of the issues listed in issue 14303, namely the `REDHAT-1531897-0.pdf`document.
2021-11-25 18:35:08 +01:00
Jonas Jenwald
ae4f1ae3e7 Ensure that ChunkedStream won't attempt to request data *beyond* the document size (issue 14303)
This bug was surprisingly difficult to track down, since it didn't just depend on range-requests being used but also on how quickly the document was loaded. To even be able to reproduce this locally, I had to use a very small `rangeChunkSize`-value (note the unit-test).

The cause of this bug is a bogus entry in the XRef-table, causing us to attempt to request data from *beyond* the actual document size and thus getting into an infinite loop.

Fixes *one* of the issues listed in issue 14303, namely the `PDFBOX-4352-0.pdf` document.
2021-11-24 19:19:43 +01:00
Jonas Jenwald
6da0944fc7 [api-minor] Replace PDFDocumentProxy.getStats with a synchronous PDFDocumentProxy.stats getter
*Please note:* These changes will primarily benefit longer documents, somewhat at the expense of e.g. one-page documents.

The existing `PDFDocumentProxy.getStats` function, which in the default viewer is called for each rendered page, requires a round-trip to the worker-thread in order to obtain the current document stats. In the default viewer, we currently make one such API-call for *every rendered* page.
This patch proposes replacing that method with a *synchronous* `PDFDocumentProxy.stats` getter instead, combined with re-factoring the worker-thread code by adding a `DocStats`-class to track Stream/Font-types and *only send* them to the main-thread *the first time* that a type is encountered.

Note that in practice most PDF documents only use a fairly limited number of Stream/Font-types, which means that in longer documents most of the `PDFDocumentProxy.getStats`-calls will return the same data.[1]
This re-factoring will obviously benefit longer document the most[2], and could actually be seen as a regression for one-page documents, since in practice there'll usually be a couple of "DocStats" messages sent during the parsing of the first page. However, if the user zooms/rotates the document (which causes re-rendering), note that even a one-page document would start to benefit from these changes.

Another benefit of having the data available/cached in the API is that unless the document stats change during parsing, repeated `PDFDocumentProxy.stats`-calls will return *the same identical* object.
This is something that we can easily take advantage of in the default viewer, by now *only* reporting "documentStats" telemetry[3] when the data actually have changed rather than once per rendered page (again beneficial in longer documents).

---
[1] Furthermore, the maximium number of `StreamType`/`FontType` are `10` respectively `12`, which means that regardless of the complexity and page count in a PDF document there'll never be more than twenty-two "DocStats" messages sent; see 41ac3f0c07/src/shared/util.js (L206-L232)

[2] One example is the `pdf.pdf` document in the test-suite, where rendering all of its 1310 pages only result in a total of seven "DocStats" messages being sent from the worker-thread.

[3] Reporting telemetry, in Firefox, includes using `JSON.stringify` on the data and then sending an event to the `PdfStreamConverter.jsm`-code.
In that code the event is handled and `JSON.parse` is used to retrieve the data, and in the "documentStats"-case we'll then iterate through the data to avoid double-reporting telemetry; see https://searchfox.org/mozilla-central/rev/8f4c180b87e52f3345ef8a3432d6e54bd1eb18dc/toolkit/components/pdfjs/content/PdfStreamConverter.jsm#515-549
2021-11-20 12:20:55 +01:00
Tim van der Meij
41ac3f0c07
Merge pull request #14291 from Snuffleupagus/force-postMessageTransfers
[api-minor] Only use Workers when `postMessage` transfers are supported (PR 11123 follow-up)
2021-11-19 20:02:51 +01:00
Brendan Dahl
c6cb39ef30
Merge pull request #14262 from Snuffleupagus/issue-14261
Include the /Lang-property, when it exists, in the StructTree-data (issue 14261)
2021-11-19 07:51:21 -08:00
Jonas Jenwald
6f22327e61 [api-minor] Only use Workers when postMessage transfers are supported (PR 11123 follow-up)
Given that all modern browsers now support `postMessage` transfers, and have for years, it no longer seems necessary for the PDF.js library to support using Workers unless the `postMessage` transfers functionality is available.
This patch is a follow-up to PR 11123, which made it impossible to *manually* disable `postMessage` transfers for performance reasons (since it increases memory usage), which hasn't caused any bug reports as far as I know.[1]

Hence we'll now only support *proper* Worker implementations, with fully working `postMessage` transfers, and fallback to using "fake" Workers otherwise.

---
[1] At the time of that PR we still "supported" IE, which is why this code was left intact.
2021-11-19 16:47:58 +01:00
Tim van der Meij
3dccaccbb4
Merge pull request #14278 from Snuffleupagus/rm-removeChild
Replace the remaining `Node.removeChild()` instances with `Element.remove()`
2021-11-17 20:17:55 +01:00
Jonas Jenwald
4ef1a129fa Replace the remaining Node.removeChild() instances with Element.remove()
Using `Element.remove()` is a slightly more compact way of removing an element, since you no longer need to explicitly find/use its parent element.
Furthermore, the patch also replaces a couple of loops that're used to delete all elements under a node with simply overwriting the contents directly (a pattern already used throughout the viewer).

See also:
 - https://developer.mozilla.org/en-US/docs/Web/API/Node/removeChild
 - https://developer.mozilla.org/en-US/docs/Web/API/Element/remove
2021-11-16 17:52:50 +01:00
Brendan Dahl
3209c013c4
Merge pull request #14247 from calixteman/button
[api-minor] Render pushbuttons on their own canvas (bug 1737260)
2021-11-16 08:10:40 -08:00
Jonas Jenwald
971ac8e993 Include the /Lang-property, when it exists, in the StructTree-data (issue 14261)
*Please note:* This is a tentative patch, since I don't have the necessary a11y-software to actually test it.
2021-11-14 12:37:41 +01:00
Jonas Jenwald
a54bed4963 Enable the ESLint no-loss-of-precision rule
Please refer to https://eslint.org/docs/rules/no-loss-of-precision
2021-11-14 10:48:50 +01:00
calixteman
85c6dd59ce
Merge pull request #14268 from calixteman/outline
Remove non-displayable chars from outline title (#14267)
2021-11-13 08:12:56 -08:00
Calixte Denizet
7041c62ccf Remove non-displayable chars from outline title (#14267)
- it aims to fix #14267;
 - there is nothing about chars in range [0-1F] in the specs but acrobat doesn't display them in any way.
2021-11-13 16:56:08 +01:00
Jonas Jenwald
afcc99a86d When parsing corrupt documents without any trailer-dictionary, fallback to the "top"-dictionary (issue 14269)
There's obviously no guarantee that this will work in general, if the document is sufficiently corrupt, but it should hopefully be better than just throwing `InvalidPDFException` as currently happens.

Please note that, as is often the case with corrupt documents, it's somewhat difficult to know if we're rendering the document "correctly" with this patch[1]. In this case even Adobe Reader cannot open the document, which is always a good sign that it's *really* corrupt, however we're at least able to render *something* with this patch.

---
[1] Whatever "correct" even means when dealing with corrupt PDF documents, where often times different PDF viewers won't agree completely.
2021-11-13 13:21:38 +01:00
Jonas Jenwald
28fb3975eb
Merge pull request #14266 from calixteman/bug931481
Don't consider space as real space when there is an extra spacing (bug 931481)
2021-11-12 21:42:32 +01:00
Calixte Denizet
a88ff34eb7 Don't consider space as real space when there is an extra spacing (bug 931481)
- it aims to fix https://bugzilla.mozilla.org/show_bug.cgi?id=931481;
 - real space chars are pushed in the chunk but when there is an extra spacing, the next char position must be compared with the previous one;
 - for example, an extra spacing can cancel a space so visually there are no space.
2021-11-12 18:53:48 +01:00
Calixte Denizet
5b7e1f5232 XFA - Avoid an exception when looking for a font in a parent node
- it aims to fix issue https://github.com/mozilla/pdf.js/issues/14150;
  - a parent can be null in case the root has been reached, so just add a check.
2021-11-12 16:27:08 +01:00
Calixte Denizet
33ea817b20 [api-minor] Render pushbuttons on their own canvas (bug 1737260)
- First step to fix https://bugzilla.mozilla.org/show_bug.cgi?id=1737260;
 - several interactive pdfs use the possibility to hide/show buttons to show different icons;
 - render pushbuttons on their own canvas and then insert it the annotation_layer;
 - update test/driver.js in order to convert canvases for pushbuttons into images.
2021-11-12 15:37:33 +01:00
Jonas Jenwald
ea1c348c67 Always prefer abbreviated keys, over full ones, when doing any dictionary lookups (issue 14256)
Note that issue 14256 was specifically about *inline* images, please refer to:
 - https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf#G7.1852045
 - https://www.pdfa.org/safedocs-unearths-pdf-inline-image-issue/
 - https://pdf-issues.pdfa.org/32000-2-2020/clause08.html#H8.9.7

However, during review of the initial PR in https://github.com/mozilla/pdf.js/pull/14257#issuecomment-964469710, it was suggested that we instead do this *unconditionally for all* dictionary lookups.
In addition to re-ordering the existing call-sites in the `src/core`-code, and adding non-PRODUCTION/TESTING asserts to catch future errors, for consistency a number of existing `if`/`switch`-blocks were re-factored to also check the abbreviated keys first.
2021-11-10 11:56:18 +01:00
calixteman
4bb9de4b00
Merge pull request #14239 from calixteman/1739502
XFA - Fix a breakBefore issue when target is a contentArea and startNew is 1 (bug 1739502)
2021-11-08 03:14:42 -08:00
Calixte Denizet
13ae6d493a XFA - Encode tag names in UTF-8 when saving (fix #14249) 2021-11-07 21:41:37 +01:00
calixteman
efb4455749
Merge pull request #14240 from calixteman/14014
XFA - Get each page asynchronously in order to avoid blocking the event loop (#14014)
2021-11-06 13:21:43 -07:00
Calixte Denizet
1681e25008 XFA - Get each page asynchronously in order to avoid blocking the event loop (#14014) 2021-11-06 13:25:03 +01:00
Brendan Dahl
b56cca0324 Create shading patterns the size of the current path. (bug 1722807)
Previously, when we created a shading pattern canvas we created it
as the same size as the page. This was good for caching if the same
pattern was used over and over again, but when lots of different
shadings are created that caused us to create many full page
canvases.

Instead of creating the full page canvses, create the canvas
as the same size as the current path bounding box. This reduces memory
consumption by a lot since most paths are pretty small. Also, in real world
PDFs it's rare for a shading (non shading fill) to be reused over and over again.
Bug 1721949 is an example where the same pattern is reused and it will be slightly
slower than before.
2021-11-05 20:44:18 -07:00
Brendan Dahl
8161d3f29d Don't double apply a group xobject's bbox.
In `beginGroup` we create a new canvas that is the size of the
bounding box and we translate it to the offset. This means we don't need to
also apply the bounding box during `paintFormXObjectBegin`.

This improves #6961 quite a bit, but it still is missing the indention
in the ruler.
2021-11-05 15:40:58 -07:00
Calixte Denizet
a08763f4aa XFA - Fix a breakBefore issue when target is a contentArea and startNew is 1 (bug 1739502)
- it aims to fix https://bugzilla.mozilla.org/show_bug.cgi?id=1739502;
 - when the target area was the current content area, everything was pushed in it instead of creating a new one (and consequently a new pageArea is created).
 - the pdf shows an alignment issue on page 4:
   - the hAlign is "center" but the subform was the width of its parent, so compute the real width of the subform with tb layout;
 - there is an extra empty page at the end of the pdf:
   - there is a subform with some hidden elements which are not rendered for now (since there is no plugged JS engine it isn't possible to draw them in changing their visibility).
   - so in case a subform is empty and has no real dimensions (at least one is 0), we just consider it as empty.
2021-11-05 18:59:55 +01:00
calixteman
e136afbabc
Merge pull request #14218 from janekotovich/subform_min_0
XFA subform with occur min=0 and no bound data displaying.
2021-11-05 04:12:34 -07:00
Jonas Jenwald
8222d6530b
Merge pull request #14232 from brendandahl/show-text-pattern
Use correct matrix for patterns with showText.
2021-11-05 10:04:56 +01:00
Brendan Dahl
1c7048399b Use correct matrix for patterns with showText.
We were incorrectly using the transform in the pattern before it had been
adjusted causing the pattern to be misplaced relative to the page.

Fixes: ShowText-ShadingPattern.pdf (already in corpus)
Fixes: #8111
Fixes: #9243
2021-11-04 16:57:36 -07:00
Jane-Kotovich
56b502391c XFA subform with occur min=0 and no bound data displaying
Subfrom nomin displays even though it's subform is set to <occur max=-1 min=0>
If we look through specs of XFA 3.3 : https://www.pdfa.org/norm-refs/XFA-3_3.pdf
- The min attribute is used when processing a form that contains data. Regardless of the data at least this number of instances is included. It is permissible to set this value to zero, in which case the container is entirely excluded if there is no data for it.

However, in our case it doesn't happen, because we let our empty dataNode get through. Though by setting a clause:
- eliminate unmatched data with occur min=0
we are checking our empty data and sending it to uselessNode array where at the end it gets removed;
2021-11-04 20:22:05 +10:00
Jonas Jenwald
e1a35e7bb6
Merge pull request #14213 from Snuffleupagus/issue-11656
Tweak the Bidi-detection heuristics for very short RTL strings (issue 11656)
2021-11-03 22:09:14 +01:00
Jonas Jenwald
5f77d3719b Tweak the Bidi-detection heuristics for very short RTL strings (issue 11656)
Very short strings can narrowly miss the existing Bidi-detection threshold, leading to incorrect text-selection and copying behaviour.

In my testing, neither Adobe Reader or PDFium seem to handle copying "correctly" for this document. Hence it's not entirely clear to me that we actually want to fix this, since tweaking these heuristics can *obviously* cause regressions elsewhere (and our test coverage for RTL-text isn't exactly great).
2021-11-03 20:31:57 +01:00
Brendan Dahl
039a7a670f Reset path bounding box tracking when starting a new path.
Starting a new path will wipe out any of the current subpaths in the
current graphics state, so we should reset the min/maxes.

This makes a number of the bounding boxes smaller and reduces the number
of composed pixels. For the smask tests in the corpus, the number of
composed pixesl goes from 19,872,109 to 19,676,905. The difference is much
larger on other PDFs though.
2021-11-03 11:46:52 -07:00
Jonas Jenwald
8c70258065
Merge pull request #14182 from calixteman/richtext
Support rich content in markup annotation
2021-10-31 14:41:56 +01:00
Calixte Denizet
cf8dc750d6 Support rich content in markup annotation
- use the xfa parser but in the xhtml namespace.
2021-10-31 13:44:51 +01:00
calixteman
2d8b6fda8f
Merge pull request #14207 from janekotovich/forms_version_popup
JS - Avoid a popup to ask for specific version of Acrobat
2021-10-30 05:45:31 -07:00
Tim van der Meij
ec1633c33c
Merge pull request #14201 from Snuffleupagus/bug-1219400
Use the correct border-style for Annotations, when a dash array is specified (bug 1219400)
2021-10-30 12:39:46 +02:00
Jane-Kotovich
12f89d2ab1 JS - Avoid a popup to ask for specific version of Acrobat
Embedded JS in PDF keep throwing alert reagdring specific version of Acrobat (Spanish and version 5.0 or greater).
This happens because:
- JS in pdf is enabled
- PDF contains some unsupported features (e.g. XFA)
Alert come when app.formVersion = undefined || app.formVersion < 5.0
In pdf.js we were using FORM_VERSION = undefined. After researching based on https://opensource.adobe.com/dc-acrobat-sdk-docs/acrobatsdk/pdfs/acrobatsdk_jsapiref.pdf\#G4.1993509 and Acrobat DC we decided to go with the larger number to avoid unnecessary popups.
Through investigation we realise that VIEWER_VERSION should have same value - a number.
Due to all that, we implemented 21.00720099 as a value for both FORMS_VERSION and VIEWER_VERSION
2021-10-29 23:09:59 +10:00
Tim van der Meij
0e7614df7f
Merge pull request #14180 from Snuffleupagus/bug-1627427
Handle ranges that "overflow" the last byte in `CMap.mapBfRange` (bug 1627427)
2021-10-27 20:06:09 +02:00
Jonas Jenwald
884caf602e Use the correct border-style for Annotations, when a dash array is specified (bug 1219400)
Even though we cannot use the dash array in the display layer, at least ensure that we use the correct border-style.
2021-10-27 13:20:21 +02:00
Jane-Kotovich
91fc643ff9 [api-minor] Implement securityHandler in the scripting API (bug 1731578) 2021-10-26 23:42:04 +10:00
Jonas Jenwald
aa1b78684f Handle ranges that "overflow" the last byte in CMap.mapBfRange (bug 1627427) 2021-10-24 13:48:38 +02:00
Tim van der Meij
0aaa4e3dbe
Merge pull request #14156 from Snuffleupagus/escodegen-fork
Add support for modern ECMAScript `class` features
2021-10-23 19:12:44 +02:00
Jonas Jenwald
52372b9378
Merge pull request #14175 from brendandahl/smask-v2
Use a new method for handling soft masks.
2021-10-23 09:27:18 +02:00
Brendan Dahl
82681ea20c Track the clipping box and bounding box of the path.
This allows us to compose much smaller regions of soft
mask making them much faster. This should also allow
for further optimizations in the pattern code.

For example locally I see issue #6573 go from 55s
to 5s with this change.

Fixes #6573
2021-10-22 13:41:29 -07:00
Brendan Dahl
2d1f9ff7a3 Use a new method for handling soft masks.
The old method of handling soft masks had a number of issues where the temporary
drawing canvas and the suspended main canvas could get out of sync
(e.g. mismatched save/restores or clip state) or we could end up compositing at
the wrong time. A good example of things getting out sync is the reduced test
case in #9017.

To fix this I've changed two big things:

1) Duplicate all the needed graphics state from the temporary canvas to the
suspended main canvas. This ensure the canvases stay in sync so that when we
switch back to the main canvas the graphics state stack is the same
(e.g. transforms, clip paths).

2) Immediately composite after each drawing operation. This ensures that if
there's an active clip region that we'll still be able to composite the correct
portions of the canvas. Note: This solution could be avoided by using
getImageData and putImageData since those ignore clipping region, but this is
very very slow. Note2: I also think the old way of only compositing at the end
of the soft mask is incorrect and can lead to wrong colors if drawing over the
same region, but in practice this doesn't seem to matter much.

Fixes: #5781
Fixes: #5853
Fixes: #7267
Fixes: #7891
Fixes: #8403
Fixes: #8624
Fixes: #12798
Fixes: #13891
Fixes: #9017 (reduced test case)
Fixes: https://bugzilla.mozilla.org/show_bug.cgi?id=1703683
2021-10-22 13:41:21 -07:00
Jonas Jenwald
89785a23f3 Convert Metadata to use private class fields
Please refer to https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Classes/Private_class_fields
2021-10-22 22:01:19 +02:00
Tim van der Meij
11f030d301
Merge pull request #14171 from Snuffleupagus/issue-14170
Prevent run-time errors in Node.js versions with `URL.createObjectURL` support (issue 14170)
2021-10-22 21:07:19 +02:00
Jonas Jenwald
044197808a Prevent double-rendering borders for PushButton-annotations (PR 14083 follow-up)
With ResetForm-action support added in PR 14083, there's a regression in the `issue12716` test-case. More specifically the border around the "Clear Form"-link is now rendered *twice*, once in the canvas via the appearance-stream and once in the annotationLayer via the border-data.
This looks slightly weird, and was most likely not intended, which is why this patch suggests that we ignore the border in the annotationLayer when an appearance-stream exists.
2021-10-21 13:31:16 +02:00
Jonas Jenwald
ff9d2b2ab1 Prevent run-time errors in Node.js versions with URL.createObjectURL support (issue 14170)
Apparently Node.js has added *global* `URL.createObjectURL` support, but not done the same thing for `Blob`. Hence we also need to check for the availability of `Blob` in the `createObjectURL` helper function, and it's probably a good idea to also update `examples/node/pdf2svg.js` to work-around this until these changes reach an official PDF.js release.
2021-10-21 10:32:44 +02:00
Tim van der Meij
382be22c11
Merge pull request #14160 from Snuffleupagus/pr-13770-followup
Fix pattern handling regression in `SVGGraphics` (PR 13770 follow-up)
2021-10-19 19:31:18 +02:00
Brendan Dahl
b66239d6dc
Merge pull request #14114 from Snuffleupagus/issue-14110
[api-minor] Include the /Lang-property in the `documentInfo`, and use it in the viewer (issue 14110)
2021-10-19 08:08:08 -07:00
Jonas Jenwald
68e6622c57 Ignore Square/Circle-annnotations with a zero borderWidth when creating a fallback appearance stream (issue 14164)
Trying to render these Annotation-types, when the borderWidth is `0`, causes a "hairline" border to appear. If these Annotations included an appearance stream, as they are supposed to, this wouldn't have happened and the simplest solution here seem to be to just ignore these particular Annotations.
2021-10-19 15:27:42 +02:00
Jonas Jenwald
8c6f1e45c7 Fix pattern handling regression in SVGGraphics (PR 13770 follow-up)
While the FAQ clearly lists the SVG back-end as unsupported, see https://github.com/mozilla/pdf.js/wiki/Frequently-Asked-Questions#backends, I suppose that small/simple regressions still makes sense to fix.
2021-10-18 21:40:10 +02:00
calixteman
bbb64369f1
Merge pull request #13424 from calixteman/chunks2
[api-minor] Fix issues in text selection
2021-10-18 06:14:15 -07:00
Calixte Denizet
61d1063276 Fix issues in text selection
- PR #13257 fixed a lot of issues but not all and this patch aims to fix almost all remaining issues.
  - the idea in this new patch is to compare position of new glyph with the last position where a glyph has been drawn;
    - no space are "drawn": it just moves the cursor but they aren't added in the chunk;
    - so this way a space followed by a cursor move can be treated as only one space: it helps to merge all spaces into one.
  - to make difference between real spaces and tracking ones, we used a factor of the space width (from the font)
    - it was a pretty good idea in general but it fails with some fonts where space was too big:
    - in Poppler, they're using a factor of the font size: this is an excellent idea (<= 0.1 * fontSize implies tracking space).
2021-10-17 16:27:05 +02:00
Jonas Jenwald
00720d059a [api-minor] Include the /Lang-property in the documentInfo, and use it in the viewer (issue 14110)
*Please note:* This is a tentative patch, since I don't have the necessary a11y-software to actually test it.

To avoid having to add a new API-method just for a single string, I figured that adding the new property to the existing `documentInfo`-data (accessed via `PDFDocumentProxy.getMetadata` in the API) will hopefully be deemed acceptable.
2021-10-16 14:27:47 +02:00
Jonas Jenwald
0041230072 Re-name the XFAFactory.numberPages getter to XFAFactory.numPages for consistency
All other similar getters are called `numPages` throughout the code-base, and improved consistency should always be a good thing.
2021-10-16 12:56:21 +02:00
Jonas Jenwald
0e5348180e Fix the inconsistent return type of the PDFDocument.isPureXfa getter
Also (slightly) simplifies a couple of small getters/methods related to the `XFAFactory`-instance.
2021-10-16 12:56:20 +02:00
Jonas Jenwald
cd94a44ca1 Remove some duplication in *simple* shadowed getters in src/core/-code
In these cases there's no good reason, in my opinion, to duplicate the `shadow`-lines since that unnecessarily increases the risk of simple typos (see the previous patch).
2021-10-16 12:56:17 +02:00
Jonas Jenwald
1450da4168 Fix a xfaFaxtory typo in the shadowing in the PDFDocument.xfaFactory getter
With this typo the shadowing doesn't actually work, which causes these checks to be unnecessarily repeated. In this particular case it didn't have a significant performance impact, however we should definately fix this nonetheless.
2021-10-16 11:54:12 +02:00
Jane-Kotovich
c2af309917 XFA - Embedded image is missing 2021-10-15 21:12:29 +10:00
Tim van der Meij
f6d9d91965
Merge pull request #14116 from Snuffleupagus/api-more-optional-chaining
Use even more optional chaining in the `src/display/api.js` file
2021-10-13 19:38:03 +02:00
Jay Berkenbilt
586295fad6 Implement TrueType character map "format 2" (fixes #14117)
If a PDF included an embedded TrueType font whose preferred character
map (cmap) was in "format 2", the code would select that character map
and then refuse to read it because of an unsupported format, thus
causing the characters not to be rendered. This commit implements
support for format 2 as described at the link below.

https://developer.apple.com/fonts/TrueType-Reference-Manual/RM06/Chap6cmap.html
2021-10-13 07:37:14 -04:00
Jonas Jenwald
8fc9c7e41c Use even more optional chaining in the src/display/api.js file
This patch (slightly) simplifies a couple of `onProgress` and `onUnsupportedFeature` call-sites.
Finally, while unrelated, also removes some unnecessary `return undefined;` statements (PR 11601 follow-up).
2021-10-12 12:05:59 +02:00
Jonas Jenwald
8721557a08 For Annotations that define a closed area, make all of it toggle the PopupAnnotation (issue 14107)
For Circle, Square, and Polygon Annotations it's currently only possible to toggle the associated PopupAnnotation by clicking on its border. Depending on the border width, and also the current zoom-level in the viewer, that can make interacting with certain Annotations *practically* impossible (which is the case in issue 14107).
Hence, in order to improve this, change the "fill"-property of the SVG element in the annotationLayer to make the *entire* element part of the click/mouse-over target.

*Please note:* Given that this is a viewer-related issue, there's no simple way to test this as far as I can tell.
2021-10-09 15:55:15 +02:00
Tim van der Meij
56e3ef68d4
Merge pull request #14106 from calixteman/names
Empty name is allowed in ISO 32000
2021-10-09 14:29:10 +02:00
Jonas Jenwald
69a97bcba7 Take the /CIDToGIDMap data into account when computing the hash, in PartialEvaluator.preEvaluateFont, for composite fonts (bug 1734802)
This is unfortunately *yet another* bug in the `preEvaluateFont`-implementation, and I've lost count of the number of times I've had to tweak this code over the years :-(
I really cannot help thinking that PR 4423 was way too simplistic, since it missed a bunch of cases that leads to broken font rendering in many PDF documents.

Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1734802
2021-10-08 13:15:21 +02:00
Calixte Denizet
f384ad2356 Empty name is allowed in ISO 32000
- the exact sentence from the spec:
    "The token SOLIDUS (a slash followed by no regular characters) introduces a unique valid name defined by the empty sequence of characters."
  - so just remove the warning.
2021-10-06 20:50:39 +02:00
Jonas Jenwald
d49b1bf2ee Use the native structuredClone implementation when it's available
With a recent addition to the HTML specification, the internal structured clone algorithm used in browsers is (or will be, once it's implemented) *directly* accessible to JavaScript; please see https://developer.mozilla.org/en-US/docs/Web/API/WindowOrWorkerGlobalScope/structuredClone

Hence we'll *eventually* not need to maintain our own structured clone functionality in the `LoopbackPort`-class in the API, however for the time being we'll feature detect `structuredClone` and fallback to the existing PDF.js implementation.

Given that https://bugzilla.mozilla.org/show_bug.cgi?id=1722576 has landed in Firefox 94, we should no longer need the manually implemented `cloneValue`-functionality in MOZCENTRAL builds. Note also that in the Firefox built-in PDF Viewer it's not possible for users to *easily* disable workers, which should further reduce the risk of these changes.
2021-10-03 10:55:33 +02:00
Jonas Jenwald
8cb6efec2d [api-minor] Add a wrapper around the addLinkAttributes-function, in the API, to the PDFLinkService implementations
This patch helps reduce some duplication, given that we now have a few essentially identical `addLinkAttributes` call-sites in the code-base.
To prevent runtime errors in the Annotation/XFA-layer code, we'll warn if a custom/incomplete `PDFLinkService` is being used (limited to GENERIC builds).
2021-10-02 12:28:00 +02:00
Jonas Jenwald
bb9c905c5d Ensure that various URL-related options are applied in the xfaLayer too
Note how both the annotationLayer and the document outline will apply various URL-related options when creating the link-elements.
For consistency the `xfaLayer`-rendering should obviously use the same options, to ensure that the existing options are indeed applied to all URLs regardless of where they originate.
2021-10-02 09:32:23 +02:00
Jonas Jenwald
284d259054
Merge pull request #14057 from Snuffleupagus/bug-920426
Support CMap-data with only strings, when parsing TrueType composite fonts (bug 920426)
2021-10-01 23:22:25 +02:00
Jonas Jenwald
67a642c826 Replace a couple of Array.prototype.forEach-invocations with for..of instead
Given that `NodeList`s can be iterated using `for..of` we can use that instead, since it's a little bit nicer and easier to read than the `Array.prototype.forEach` format.
2021-10-01 09:06:17 +02:00
Calixte Denizet
aecbd7cd89 AcroForm: Add support for ResetForm action
- it aims to fix #12721.
  - Thanks to PR #14023, we've now the fieldObjects in the annotation layer so we can easily map fields names on their id if needed.
  - Reset values in the storage, in the JS sandbox and in the visible html elements.
2021-09-30 22:02:33 +02:00
Jonas Jenwald
d3ca28bc34 Support CMap-data with only strings, when parsing TrueType composite fonts (bug 920426)
In the referenced bug, the embedded fonts contain custom CMap-data that only include strings. Note how for embedded composite TrueType fonts we're using the CMap-data when building the glyph mapping, and currently we end up with a completely empty map because the code expects only CID *numbers*.
Furthermore, just fixing the glyph mapping alone isn't sufficient to fully address the bug, since we also need to consider this "special" kind of CMap-data when looking up glyph widths.
2021-09-30 18:10:47 +02:00
Tim van der Meij
9a74f3e6e0
Merge pull request #14049 from calixteman/bg_from_mk
Annotation - Use border and background colors from MK dictionary
2021-09-29 21:13:20 +02:00
Calixte Denizet
0776cd9b90 Annotation - Use border and background colors from MK dictionary
- it aims to fix #13003;
  - set the bg and fg colors as they're in the pdf;
  - put a transparent overlay to help to see the fields.
2021-09-26 20:49:26 +02:00
Jonas Jenwald
e6e04694f4 [api-minor] Move the addDefaultProtocolToUrl/tryConvertUrlEncoding functionality into the createValidAbsoluteUrl function
Having recently worked with, and reviewed patches touching, this code it seemed that it's probably not a bad idea to move that functionality into `createValidAbsoluteUrl` as new options instead.

For the `addDefaultProtocolToUrl` functionality in particular, the existing helper function was not only moved but slightly improved as well. Looking at the code, I realized that there's a small risk that it would incorrectly match a *relative* URL-string too.

With these changes, the `createValidAbsoluteUrl` call-sites in the `src/core/`-code can be simplified a little bit.

*Please note:* This patch may, indirectly, change the format of the `unsafeUrl`-property returned with relevant Annotations and OutlineItems; hence the `api-minor` tag.
However, I'd argue that it's actually more correct this way since the whole purpose of `unsafeUrl` is/was to return the URL data as-is without any parsing done.
2021-09-26 14:29:54 +02:00
Calixte Denizet
558e58f354 XFA - Add <a> element in button when an url is detected (bug 1716758)
- it aims to fix https://bugzilla.mozilla.org/show_bug.cgi?id=1716758;
  - some buttons have a JS action with the pattern `app.launchURL(...)` (or similar) so extract when it's possible the url and generate a <a> element with the href equals to the found url;
  - pdf.js already had some code to handle that so this patch slightly refactor that.
2021-09-25 21:59:39 +02:00
Calixte Denizet
c0e9108d00 Annotation - Some checkboxes have an empty N dictionary
- it aims to fix #14021;
  - the N dict is empty here so just create a default one;
  - it implies that the checked checkbox has no appearance so create a default one too in order to print it;
  - in the pdf in the issue, a checked box is not printed because it has no default appearance so we need to guess its appearance from its state.
2021-09-25 16:00:47 +02:00
Tim van der Meij
cc110b8542
Merge pull request #14064 from Snuffleupagus/issue-13845
Fallback to font name matching, when checking for serif fonts (issue 13845)
2021-09-25 12:41:57 +02:00
Jonas Jenwald
b23b8d8a5d
Merge pull request #14074 from Snuffleupagus/issue-14046
[api-minor] Add basic support for RTL text-content in PopupAnnotations (issue 14046)
2021-09-25 12:37:44 +02:00
Tim van der Meij
36dc93fe5d
Merge pull request #14065 from Snuffleupagus/fewer-EXPORT_DATA_PROPERTIES
[api-minor] Stop exporting, by default, a few additional Font properties (PR 11777 follow-up)
2021-09-25 12:25:56 +02:00
Tim van der Meij
ee34572fd0
Merge pull request #14070 from Snuffleupagus/MessageHandler-local-vars
Some small readability improvements in the `MessageHandler` code
2021-09-25 12:22:17 +02:00
Tim van der Meij
07558c158d
Merge pull request #14069 from Snuffleupagus/deprecate-OPS-paintJpegXObject
Mark the `paintJpegXObject` operator as deprecated (PR 11601 follow-up)
2021-09-25 12:15:33 +02:00
Jonas Jenwald
1dcd2f0cd3 [api-minor] Add basic support for RTL text-content in PopupAnnotations (issue 14046)
In order to implement this, we utilize the existing `bidi` function to infer the text-direction of /T and /Contents entries. While this may not be perfect in cases where one PopupAnnotation mixes LTR and RTL languages, it should work well enough in most cases.
To avoid having to add *two new* properties in lots of annotations, supplementing the existing `title`/`contents`-properties, this patch instead re-factors the existing code such that the properties are replaced by Objects (containing `str` and `dir`).

*Please note:* In order avoid breaking existing third-party implementations, `GENERIC`-builds of the PDF.js library will still provide the old `title`/`contents`-properties on annotations returned by `PDFPageProxy.getAnnotations`.
2021-09-25 09:18:58 +02:00
calixteman
104e049338
Merge pull request #14073 from calixteman/bindItems
XFA - Bind items when there's a bindItems entry
2021-09-24 09:01:52 -07:00
Calixte Denizet
97c1e076a1 XFA - Bind items when there's a bindItems entry
- In the pdf in issue #14071, some select fields don't contain any values;
  - the corresponding node has a bindItems and a bind elements and _bindItems function was just not called.
2021-09-24 16:08:58 +02:00
Calixte Denizet
cd73e282eb XFA - Create a new page in case of overflow
- it aims to fix #14071;
  - a subform is overflowing and the the target in case of overflow is itself. In this case we must create a new page.
2021-09-24 14:57:55 +02:00
Jonas Jenwald
890a6c1108 Some small readability improvements in the MessageHandler code
In particular the `_processStreamMessage`-method is a bit cumbersome to read, given the way that the current streamController/streamSink is accessed, which we can improve with a couple of local variables.
2021-09-24 13:07:20 +02:00
Jonas Jenwald
7d56fb4cbf Mark the paintJpegXObject operator as deprecated (PR 11601 follow-up)
After PR 11601, the `paintJpegXObject` operator is no longer used for anything. While I don't think we can just remove it, and essentially leave a "hole" in the `OPS` structure, we should at least mark it as explicitly unused to aid readability/maintainability of the code.
2021-09-24 12:47:28 +02:00
Brendan Dahl
d370a281c4
Merge pull request #14067 from calixteman/1732344
Don't save anything in XFA entry if no XFA! (bug 1732344)
2021-09-23 15:07:00 -07:00
Calixte Denizet
4b0538d07a Don't save anything in XFA entry if no XFA! (bug 1732344)
- it aims to fix https://bugzilla.mozilla.org/show_bug.cgi?id=1732344
  - rename some variables to have a more clear code;
  - and last but no least, add a unit test to test saving.
2021-09-23 19:51:23 +02:00
Jonas Jenwald
fd1f0f647f Print a special warning message, in the viewer, for XFA Foreground documents
Currently XFAF documents use the same warning message as in the XFA *disabled* case, which is neither helpful nor correct.
2021-09-23 15:02:24 +02:00
Jonas Jenwald
6cba5509f2 Re-factor document.getElementsByName lookups in the AnnotationLayer (issue 14003)
This replaces direct `document.getElementsByName` lookups with a helper method which:
 - Lets the AnnotationLayer use the data returned by the `PDFDocumentProxy.getFieldObjects` API-method, such that we can directly lookup only the necessary DOM elements.
 - Fallback to using `document.getElementsByName` as before, such that e.g. the standalone viewer components still work.

Finally, to fix the problems reported in issue 14003, regardless of the code-path we now also enforce that the DOM elements found were actually created by the AnnotationLayer code.
With these changes we'll thus be able to update form elements on all visible pages just as before, but we'll additionally update the AnnotationStorage for not-yet-rendered elements thus fixing a pre-existing bug.
2021-09-23 13:05:18 +02:00
Jonas Jenwald
9acfe486d4 Fallback to font name matching, when checking for serif fonts (issue 13845)
In order to handle fonts that specify completely bogus /Flags-entries, fallback to font name matching to determine if the font is a serif one.
2021-09-23 01:11:57 +02:00
Jonas Jenwald
e027748627 [api-minor] Stop exporting, by default, a few additional Font properties (PR 11777 follow-up)
*This is similar to the "isSymbolicFont"-property, which is no longer exported by default after PR 11777.*

Both "isMonospace" and "isSerifFont" are internal properties, used during font parsing and building of the glyph mapping on the worker-thread.
However both of these properties are completely unused on the main-thread and/or in the API, and accessing them they will now require setting the `fontExtraProperties`-option when calling `getDocument`.
2021-09-23 00:44:43 +02:00
Tim van der Meij
5254676ef3
Merge pull request #14055 from Snuffleupagus/PDF_TO_CSS_UNITS
Add `PDF_TO_CSS_UNITS` to the `PixelsPerInch`-structure
2021-09-22 22:24:51 +02:00
Jonas Jenwald
81a1c1cef7 Correctly validate URLs in XFA documents (bug 1731240)
With this patch we'll ensure that only valid absolute URLs can be used in XFA documents, similar to the existing validation done for "regular" PDF documents.
Furthermore, we'll also attempt to add a default protocol (i.e. `http`) to URLs beginning with "www." in XFA documents as well; this on its own is enough to fix https://bugzilla.mozilla.org/show_bug.cgi?id=1731240
2021-09-21 21:21:01 +02:00
Jonas Jenwald
3e550f392a Add PDF_TO_CSS_UNITS to the PixelsPerInch-structure
Rather than re-computing this value in a number of different places throughout the code-base[1], we can expose this in the API via the existing `PixelsPerInch`-structure instead.
There's also been feature requests asking for the old `CSS_UNITS` viewer constant to be made accessible, such that it could be used in third-party implementations.

I suppose that it could be argued that it's somewhat confusing to place a unitless property in `PixelsPerInch`, however given that the `PDF_TO_CSS_UNITS`-property is defined strictly in terms of the existing properties this is hopefully deemed reasonable.

---
[1] These include:
 - The viewer, with the `CSS_UNITS` name.
 - The reference-tests.
 - The display-layer, when rendering images; see PR 13991.
2021-09-20 13:20:09 +02:00
Jonas Jenwald
8ea27ce157 Tweak how fonts with an /Encoding are handled in adjustToUnicode (issue 14048, PR 13277 follow-up)
Currently we only exclude /Encoding entries that also contains a /Differences array, which is the cause of the text-selection problem in the referenced issue.
In order to address this we'll now also exclude /Encoding entries that contain one of the predefined *named* encodings, and no longer require that it also contains a /Differences array.

*Please note:* This patch cases a small "regression" in the `bug1130815-text` test-case, however this is actually an improvement when compared with Adobe Reader and PDFium (in Google Chrome).
2021-09-18 22:44:25 +02:00
Tim van der Meij
83d3bb43f4
Merge pull request #14041 from Snuffleupagus/issue-9367
Support cmaps with only CID characters, when building the ToUnicode-map (issue 9367)
2021-09-18 16:47:06 +02:00
Jonas Jenwald
20eb6ca2ec
Merge pull request #14044 from calixteman/bug1719148
Annotations - Avoid empty value in text field when storage contains something for it (bug 1719148)
2021-09-18 16:31:45 +02:00
Jonas Jenwald
6634afd646
Merge pull request #14045 from calixteman/noise
XFA - Only warn about the wrong xfa type when there is an xfa thing
2021-09-18 16:13:20 +02:00
Tim van der Meij
c870fb489e
Merge pull request #14013 from Snuffleupagus/api-unittest-instanceof
Improve the API unit-tests, and try to expose more API-functionality in the TypeScript definitions
2021-09-18 16:08:19 +02:00
Calixte Denizet
2fc10727c5 XFA - Only warn about the wrong xfa type when there is an xfa thing 2021-09-18 15:44:05 +02:00
calixteman
ffa2572bdf
Merge pull request #14038 from calixteman/saveas
JS - Implement few possibilities with app.execMenuItem (bug 1724399)
2021-09-18 15:33:03 +02:00
Calixte Denizet
eb762ad624 Annotations - Avoid empty value in text field when storage contains something for it (bug 1719148)
- it aims to fix https://bugzilla.mozilla.org/show_bug.cgi?id=1719148;
  - JS can set a property for a non-rendered annotation using the annotationStorage but the other unset default properties must be used when the annotation is finally rendered;
  - so this patch just adds the properties already set in the annotationStorage to the default value.
2021-09-18 15:08:22 +02:00
Calixte Denizet
bfd570038d JS - Implement few possibilities with app.execMenuItem (bug 1724399)
- it aims to fix: https://bugzilla.mozilla.org/show_bug.cgi?id=1724399.
2021-09-18 13:52:32 +02:00
Jonas Jenwald
e3223b68fc Extract some of the glyphMap handling, for non-embedded composite standard fonts, into a helper function
This reduces some unnecessary duplication, since we currently have essentially the same code in a handful of places in the `Font.fallbackToSystemFont`-method.
2021-09-18 12:39:48 +02:00
Jonas Jenwald
ed73cf6d50 Support cmaps with only CID characters, when building the ToUnicode-map (issue 9367)
In this particular case the `CMap`-data that we create contains only numbers, but no strings, which causes `PartialEvaluator.readToUnicode` to create a ToUnicode-map with only empty strings.

*Please note:* This is yet another case where I don't know if it's necessarily the best and most correct solution, but it does fix the referenced issue.
2021-09-18 00:26:15 +02:00
Calixte Denizet
e87c12bf34 JS - Avoid the Stay/Leave popup when clicking on a button with a JS action
- it aims to fix #14039.
2021-09-17 21:04:07 +02:00
Calixte Denizet
5bef8120e7 Annotation - For checkboxes, get field value from AS (if any) instead of V (bug 1722036)
- it aims to fix https://bugzilla.mozilla.org/show_bug.cgi?id=1722036.
  - AS and V should share the same value for checkbox: it's at least what the specs say;
  - the pdf in the above bug opens correctly in Acrobat so it likely means that AS is chosen over V.
2021-09-17 13:04:16 +02:00
Brendan Dahl
d6a27860e3
Merge pull request #14025 from Snuffleupagus/issue-11915
Improve glyph mapping for non-embedded composite standard fonts with a /CIDToGIDMap (issue 11915)
2021-09-16 08:06:35 -07:00
Calixte Denizet
a3aa6dd6ab Annotation - Checkboxes with the same name and export values must be in unison
- it aims to fix #14024.
  - this patch adds an attribute `acroformExportValue` to the HTML input in order to set the checked attribute in taking into account the exportValue for the checkboxes with the same name.
2021-09-15 15:30:24 +02:00
Jonas Jenwald
a11343e9af Improve glyph mapping for non-embedded composite standard fonts with a /CIDToGIDMap (issue 11915)
*Please note:* All of this feels very handwavy, but at least it passes all tests locally. Hopefully we have enough tests for this part of the font code.

For non-embedded composite standard fonts with an "incomplete" /CIDToGIDMap, we'll now fallback to an *explicitly defined* /ToUnicode map even when that one happens to be an /Identity-H or /Identity-V map.

The `Font.fallbackToSystemFont` method is unfortunately getting more and more special-cases, however that might be unavoidable given all the weird non-embedded fonts found in the wild :-(
2021-09-15 11:30:40 +02:00
Calixte Denizet
9812e35916 XFA - Don't create images for unsupported mime types 2021-09-14 10:55:25 +02:00
Jonas Jenwald
95057a4e56 Try to expose more API-functionality in the TypeScript definitions
While these types apparently makes sense in TypeScript environments, we really don't want to extend the *public* API by simply exporting the relevant classes directly in `src/pdf.js` (since they should never be called/initialized manually).

Please see e.g. issue 12384 where this was first requested, and note that a possible work-around was also provided there. This patch simply implements that work-around[1], which will hopefully be helpful to TypeScript users.

---
[1] Based on the discussion in PR 13957, the two previous patches appear to be necessary for this to actually work.
2021-09-13 13:57:56 +02:00
Jonas Jenwald
d854352cd5 Improve the API unit-tests by checking that PDFPageProxy.render returns a RenderTask-instance
This is similar to existing unit-tests, which checks for `PDFDocumentProxy`- and `PDFPageProxy`-instances.
2021-09-13 13:34:37 +02:00
Jonas Jenwald
fa7a607d33 Improve the API unit-tests by checking that getDocument returns a PDFDocumentLoadingTask-instance
This is similar to existing unit-tests, which checks for `PDFDocumentProxy`- and `PDFPageProxy`-instances.
2021-09-13 13:34:28 +02:00
Jonas Jenwald
7025b9f859 [src/core/writer.js] Support null values in the writeValue function
*This fixes something that I noticed, having recently looked at both the `Lexer.getObj` and `writeValue` code.*

Please note that I unfortunately don't have an example of a form where saving fails without this patch. However, given its overall simplicity and that unit-tests are added, it's hopefully deemed useful to fix this potential issue pro-actively rather than waiting for a bug report.

At this point one might, and rightly so, wonder if there's actually any real-world PDF documents where a `null` value is being used?
Unfortunately the answer is *yes*, and we have a couple of examples in the test-suite (although none of those are related to forms); please see: `issue1015`, `issue2642`, `issue10402`, `issue12823`, `issue13823`, and `pr12564`.
2021-09-12 18:24:37 +02:00
Jonas Jenwald
5d578ea36a [src/core/writer.js] Remove unnecessary string-wrapping for boolean values in writeValue (PR 13998 follow-up) 2021-09-12 15:45:45 +02:00
Jonas Jenwald
761519ef3f
Merge pull request #13998 from calixteman/bug1729971
Write boolean value when saving a form (bug 1729971)
2021-09-12 15:38:10 +02:00
Jonas Jenwald
a47844d1fc Let Lexer.getObj return a dummy-Cmd for commands that start with a non-visible ASCII character (issue 13999)
This way we avoid breaking badly generated PDF documents where a non-visible ASCII character is "glued" to a valid command.
2021-09-11 19:54:13 +02:00
Tim van der Meij
e97f01b17c
Merge pull request #13977 from Snuffleupagus/enqueueChunk-batch
[api-minor] Reduce `postMessage` overhead, in `PartialEvaluator.getTextContent`, by sending text chunks in batches (issue 13962)
2021-09-11 13:34:07 +02:00
Jonas Jenwald
0e54f568fb Re-factor the CSS_PIXELS_PER_INCH/PDF_PIXELS_PER_INCH exports (PR 13991 follow-up)
For improved maintainability, since these constants are being exposed in the official API, this patch moves them into an Object instead.
2021-09-11 11:15:25 +02:00
Jonas Jenwald
bd51bbfd16 Remove mozImageSmoothingEnabled fallback in CanvasGraphics.endGroup
This was added all the way back in PR 2936, however it's been unnecessary ever since Firefox 51 (released on 2017-01-24); please see the MDN compatibility data:
https://developer.mozilla.org/en-US/docs/Web/API/CanvasRenderingContext2D/imageSmoothingEnabled#browser_compatibility
2021-09-11 10:30:39 +02:00
Jonas Jenwald
9ce63a6dc6
Merge pull request #13991 from brendandahl/interpolate
Enable/disable image smoothing based on image interpolate value. (bug 1722191)
2021-09-11 10:02:53 +02:00
Brendan Dahl
f38fb42b42 Enable/disable image smoothing based on image interpolate value. (bug 1722191)
While some of the output looks worse to my eye, this behavior more
closely matches what I see when I open the PDFs in Adobe acrobat.

Fixes: #4706, #9713, #8245, #1344
2021-09-10 14:23:35 -07:00
Calixte Denizet
474ab7c86d Write boolean value when saving a form (bug 1729971)
- it aims to fix https://bugzilla.mozilla.org/show_bug.cgi?id=1729971#c4.
2021-09-10 14:10:25 +02:00
calixteman
57b80074a2
Merge pull request #13995 from calixteman/xfa_record
XFA - Handle $record shorcut in SOM expression (issue #13994)
2021-09-10 13:57:50 +02:00
Calixte Denizet
c5841b3794 XFA - Handle shorcut in SOM expression (issue #13994) 2021-09-09 19:54:45 +02:00
Calixte Denizet
623860bf8f XFA - Remove the checked attribute from the checkbox when unchecked (bug 1729877)
- it aims to fix: https://bugzilla.mozilla.org/show_bug.cgi?id=1729877.
2021-09-09 19:14:16 +02:00
Jonas Jenwald
45ddb12f61 Remove no-op onPull/onCancel streamSink callbacks from the "GetTextContent"-handler
The `MessageHandler`-implementation already handles either of these callbacks being undefined, hence there's no particular reason (as far as I can tell) to add no-op functions here.

Also, in a couple of `MessageHandler`-methods, utilize an already existing local variable more.
2021-09-09 00:01:10 +02:00
Jonas Jenwald
f90f9466e3 [api-minor] Reduce postMessage overhead, in PartialEvaluator.getTextContent, by sending text chunks in batches (issue 13962)
Following the STR in the issue, this patch reduces the number of `PartialEvaluator.getTextContent`-related `postMessage`-calls by approximately 78 percent.[1]
Note that by enforcing a relatively low value when batching text chunks, we should thus improve worst-case scenarios while not negatively affect all `textLayer` building.

While working on these changes I noticed, thanks to our unit-tests, that the implementation of the `appendEOL` function unfortunately means that the number and content of the textItems could actually be affected by the particular chunking used.
That seems *extremely* unfortunate, since in practice this means that the particular chunking used is thus observable through the API. Obviously that should be a completely internal implementation detail, which is why this patch also modifies `appendEOL` to mitigate that.[2]

Given that this patch adds a *minimum* batch size in `enqueueChunk`, there's obviously nothing preventing it from becoming a lot larger then the limit (depending e.g. on the PDF structure and the CPU load/speed).
While sending more text chunks at once isn't an issue in itself, it could become problematic at the main-thread during `textLayer` building. Note how both the `PartialEvaluator` and `CanvasGraphics` implementations utilize `Date.now()`-checks, to prevent long-running parsing/rendering from "hanging" the respective thread. In the `textLayer` building we don't utilize such a construction[3], and streaming of textContent is thus essentially acting as a *simple* stand-in for that functionality.
Hence why we want to avoid choosing a too large minimum batch size, since that could thus indirectly affect main-thread performance negatively.

---
[1] While it'd be possible to go even lower, that'd likely require more invasive re-factoring/changes to the `PartialEvaluator.getTextContent`-code to ensure that the batches don't become too large.

[2] This should also, as far as I can tell, explain some of the regressions observed in the "enhance" text-selection tests back in PR 13257.
    Looking closer at the `appendEOL` function it should potentially be changed even more, however that should probably not be done here.

[3] I'd really like to avoid implementing something like that for the `textLayer` building as well, given that it'd require adding a fair bit of complexity.
2021-09-09 00:01:07 +02:00