Commit Graph

5262 Commits

Author SHA1 Message Date
Calixte Denizet
687c9a8710 Improve performance of applyMaskImageData
- write some uint32 instead of uint8 to avoid the check before clamping;
- unroll the loop to write data in the buffer
- but keep a loop for the last element of a line: it likely doesn't hurt
  that much since it's executed only for one time for each line;
- I tested on a macbook with an Apple chip, and on Firefox nightly the new
  code is almost 3.5x faster than before (~1.8x with Chrome).
2022-04-09 22:19:02 +02:00
Calixte Denizet
040fcae5ab Improve performance with image masks (bug 857031)
- it aims to partially fix performance issue reported: https://bugzilla.mozilla.org/show_bug.cgi?id=857031;
- the idea is too avoid to use byte arrays but use ImageBitmap which are a way faster to draw:
  * an ImageBitmap is Transferable which means that it can be built in the worker instead of in the main thread:
    - this is achieved in using an OffscreenCanvas when it's available, there is a bug to enable them
      for pdf.js: https://bugzilla.mozilla.org/show_bug.cgi?id=1763330;
    - or in using createImageBitmap: in Firefox a task is sent to the main thread to build the bitmap so
      it's slightly slower than using an OffscreenCanvas.
  * it's transfered from the worker to the main thread by "reference";
  * the byte buffers used to create the image data have a very short lifetime and ergo the memory used is globally
    less than before.
- Use the localImageCache for the mask;
- Fix the pdf issue4436r.pdf: it was expected to have a binary stream for the image;
- Move the singlePixel trick from operator_list to image: this way we can use this trick even if it isn't in a set
  as defined in operator_list.
2022-04-09 18:26:26 +02:00
apeltop
a97dd26389 Correct typos 2022-04-09 09:43:18 +09:00
Jonas Jenwald
a919959d83 Slightly simplify the Catalog._readMarkInfo method
We don't need to first check if the Dictionary contains the key, since trying to get a non-existent key simply returns `undefined` and we're already ensuring that the value is a boolean.
Furthermore, we shouldn't need to worry about the `Object.prototype` containing enumerable properties since the checks (in `src/core/worker.js`) done for `Array.prototype` *indirectly* also cover `Object`s. (Keep in mind that an `Array` is just a special kind of `Object` in JavaScript.)
2022-04-05 16:37:51 +02:00
Jonas Jenwald
1dc4713a0b Re-factor the isLittleEndian/isEvalSupported caching
This functionality is very old, hence we should be able to improve the caching a little bit with modern JavaScript features.
2022-04-05 16:01:01 +02:00
Calixte Denizet
f4fcb59a5e Refactor some xfa*** getters in document.js
- it's a follow-up of PR #14735.
2022-04-03 20:38:12 +02:00
Jonas Jenwald
f33ce5fc2d Decode non-ASCII values found in the xfa:datasets (PR 14735 follow-up)
*Please note:* This is possibly bad/wrong in general, but I figured that submitting it for review wouldn't hurt.

It seems that even Adobe Reader doesn't handle the non-ASCII characters that appear in some of the fields correctly, however it should be pretty easy to improve things on the PDF.js side.
2022-04-01 11:54:34 +02:00
Jonas Jenwald
36a289d747
Merge pull request #14735 from calixteman/14685
[Annotations] Some annotations can have their values stored in the xfa:datasets
2022-04-01 11:30:16 +02:00
Calixte Denizet
0b597304c1 [Annotations] Some annotations can have their values stored in the xfa:datasets
- it aims to fix #14685;
- add a basic object to get values from the parsed datasets;
- these annotations don't have an appearance so we must create one when printing or saving.
2022-04-01 10:28:04 +02:00
Jonas Jenwald
addb4cb12b Use String.prototype.repeat() in a couple of spots
Rather than using a temporary Array to manually create repeated strings, we can use `String.prototype.repeat()` instead.
The reason that we didn't use this from the start is most likely because some browsers, notably IE, didn't support this; note https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/repeat#browser_compatibility
2022-03-30 15:42:40 +02:00
Calixte Denizet
ad3fb71a02 [Annotations] Add support for printing/saving choice list with multiple selections
- it aims to fix issue #12189.
2022-03-29 18:59:44 +02:00
Jonas Jenwald
0dd6bc9a85
Merge pull request #14703 from calixteman/14627
[text selection] Add the whitespaces present in the pdf in the text chunk
2022-03-27 15:20:19 +02:00
Calixte Denizet
18e79e3c0b [text selection] Add the whitespaces present in the pdf in the text chunk
- it aims to fix issue #14627;
- the basic idea of the recent text refactoring was to only consider the rendered visible whitespaces.
  But sometimes, the heuristics aren't correct and although some whitespaces are in the text stream
  they weren't in the text chunks because they were too small. Hence we added some exceptions, for example,
  we always add a whitespace when it is between two non-whitespace chars but only when in the same Tj.
  So basically, this patch removes the constraint to have the chars in the same Tj
  (in using a circular buffer to save the two last chars) but don't add a space when the visible space is really
  too small (hence `NOT_A_SPACE_FACTOR`).
2022-03-27 14:34:56 +02:00
Jonas Jenwald
7f0589c74a Change the type of the container property, in the TextLayerRenderParameters typedef (issue 14716)
Given that the textLayer-code has been using a `DocumentFragment` ever since PR 3356 (back in 2013), simply updating the type of the `container` property should be fine.
This patch also tries to, ever so slightly, improve the grammar of a couple of other properties in the typedef.
2022-03-24 22:42:37 +01:00
Jonas Jenwald
849de5a508 Slightly improve validation of (some) parameters in getDocument
There's a couple of `getDocument` parameters that should be numbers, but which are currently not *fully* validated to prevent issues elsewhere in the code-base.
Also, improves validation of the `ownerDocument` parameter since we currently accept more-or-less anything here.
2022-03-21 13:32:17 +01:00
Jonas Jenwald
73d2ddac0d Update npm packages
Note that the Prettier update made it possible to move a couple of comments after `default:`-cases back to their original/intended positions, please see https://prettier.io/blog/2022/03/16/2.6.0.html
2022-03-20 10:59:13 +01:00
Calixte Denizet
f0b549c2a2 [JS] - Parse a date in using the given format first and then try the default date parser
- it aims to fix #14672.
2022-03-19 16:07:43 +01:00
Tim van der Meij
5de6af4e64
Merge pull request #14683 from Snuffleupagus/sendTest-cleanup
[src/display/api.js] Simplify the `sendTest` function, used with Worker initialization (PR 14291 follow-up)
2022-03-19 13:38:05 +01:00
Jonas Jenwald
c0736647f9 Add general iteration support in the RefSet and RefSetCache classes
This patch removes the existing `forEach` methods, in favor of making the classes properly iterable instead. Given that the classes are using a `Set` respectively a `Map` internally, implementing this is very easy/efficient and allows us to simplify some existing code.
2022-03-18 14:27:34 +01:00
Jonas Jenwald
be2b1d5d2a [src/display/api.js] Simplify the sendTest function, used with Worker initialization (PR 14291 follow-up)
Given that we now only use Workers when `postMessage` transfers are supported, there's really no point in trying to send a "test" message *without* transfers present.
Hence, if `postMessage` transfers are not supported by the browser, we'll now fallback to "fake" Workers immediately instead. The comment about Opera is also removed, since it was originally added back in PR 983 and mentions Opera `11.60` [which was released in 2011](https://en.wikipedia.org/wiki/History_of_the_Opera_web_browser#Version_11).
2022-03-16 13:25:41 +01:00
Jonas Jenwald
d5c9be341d [src/display/api.js] Use private static class fields, rather than shadowed getter work-arounds (PR 13813, 13882 follow-up)
At the time private static class fields were to new, however that's no longer an issue and we can thus (ever so slightly) simplify the code.
2022-03-16 13:02:34 +01:00
Jonas Jenwald
0c349c701f Remove the addLinkAttributes warnings in the Annotation/XFA-layers (PR 14092 follow-up)
These warnings have now been present in three releases, see PR 14092, hence it should (hopefully) be fine to remove them now.
2022-03-13 11:38:56 +01:00
Tim van der Meij
790735eaf1
Merge pull request #14658 from Snuffleupagus/api-validate-cMapUrl-standardFontDataUrl
Validate the `cMapUrl`/`standardFontDataUrl` parameters in `getDocument`
2022-03-11 21:09:58 +01:00
Jonas Jenwald
a60b98412f Validate the cMapUrl/standardFontDataUrl parameters in getDocument
These changes make sense for two reasons:
 - Given that the parameters are potentially passed to the worker-thread, depending on the `useWorkerFetch` parameter, we need to prevent errors if the user provides values that aren't clonable.
 - By ensuring that the default values are indeed `null`, we'll trigger main-thread fetching (of CMaps and Standard fonts) as intended in the `PartialEvaluator` and thus potentially provide better Error messages.
2022-03-10 16:33:10 +01:00
Jonas Jenwald
537ed37835 Move the isSameOrigin helper function
This function is currently placed in the `src/shared/util.js` file, which means that the code is duplicated in both of the *built* `pdf.js` and `pdf.worker.js` files. Furthermore, it only has a single call-site which is also specific to the `GENERIC`-build of the PDF.js library.

Hence this helper function is instead moved into the `src/display/api.js` file, in such a way that it's conditionally defined but still can be unit-tested.
2022-03-10 13:51:09 +01:00
Tim van der Meij
e85bb0b599
Merge pull request #14645 from Snuffleupagus/Node-DOMMatrix-polyfill
[api-minor] Remove the, in `legacy` builds, bundled `DOMMatrix` polyfill
2022-03-09 20:38:26 +01:00
Tim van der Meij
55a931e454
Merge pull request #14648 from Snuffleupagus/PDFDocument-stream
Simplify the `PDFDocument` constructor
2022-03-09 20:36:49 +01:00
Jonas Jenwald
6a78f20b17 Simplify the PDFDocument constructor
Originally the code in the `src/`-folder was shared between the main/worker-threads, and back then it probably made sense that the `PDFDocument` constructor accepted different arguments.
However, for many years we've not been passing anything *except* Streams to `PDFDocument` and we should thus be able to slightly simplify that code. Note that for e.g. unit-tests of this code, using either a `NullStream` or a `StringStream` works just fine.
2022-03-08 17:13:47 +01:00
Jonas Jenwald
157a71d404 [api-minor] Remove the, in legacy builds, bundled DOMMatrix polyfill
According to the MDN compatibility data, see https://developer.mozilla.org/en-US/docs/Web/API/DOMMatrix/DOMMatrix#browser_compatibility, all browsers that we support have native `DOMMatrix` implementations (since quite some time too).

Hence Node.js is the only environment that lack `DOMMatrix` support, which probably isn't that surprising given that it's browser functionality.
While the `DOMMatrix` polyfill isn't that large, it nonetheless seems completely unnecessary to bundle it in the `legacy` builds when it's not needed in browsers. However, we can avoid that by simply listing `dommatrix` as a dependency for the `pdfjs-dist` library.
2022-03-08 10:29:11 +01:00
Jonas Jenwald
6f600befdd Update TypeScript to version 4.6.2 and work-around stricter type checks
I'm guessing that we're now running into the class-related improvements mentioned in https://devblogs.microsoft.com/typescript/announcing-typescript-4-6/#target-es2022
To unblock this update, and any future ones, this patch simply tweaks the JSDocs to get `gulp typestest` to run without errors.
2022-03-07 11:55:17 +01:00
Tim van der Meij
5242c38af5
Merge pull request #14628 from Snuffleupagus/issue-14626
When `stopAtErrors` is set, throw rather than warn when exceeding `maxImageSize` (issue 14626)
2022-03-05 13:09:36 +01:00
Tim van der Meij
5d12ac576b
Merge pull request #14631 from Snuffleupagus/typedef-fixes
Fix a couple of small typos in JSDoc `typedef` comments
2022-03-05 13:06:53 +01:00
Jonas Jenwald
939e6f0c4c Fix a couple of small typos in JSDoc typedef comments
While this doesn't affect the official API documentation, these cases should nonetheless be fixed.
2022-03-04 12:11:52 +01:00
Jonas Jenwald
1a7921dbf0 Compute the loca table endOffset, of the "first" glyph, correctly (issue 14618)
When there are *multiple* empty glyphs at the start of the data, ensure that the "first" glyph gets a correct `endOffset` to avoid skipping it during parsing in the `sanitizeGlyph` function.
2022-03-03 14:22:45 +01:00
Jonas Jenwald
d0d5c596fb When stopAtErrors is set, throw rather than warn when exceeding maxImageSize (issue 14626)
The situation described in issue 14626 seems like a fairly special case, and it thus seem reasonable that we simply follow the same pattern as elsewhere in the `PartialEvaluator` when the `stopAtErrors` API-option is being used.
2022-03-03 13:11:29 +01:00
Brendan Dahl
85ff7b117e
Merge pull request #14536 from calixteman/thin_line
Fix some issues with lineWidth < 1 after transform (bug 1753075, bug 1743245, bug 1710019)
2022-03-02 09:46:15 -08:00
Jonas Jenwald
ab55071568 Remove the JSDocs "External: Promise"-page, since Promises are now a standard feature
The "External: Promise"-page in the JSDocs pre-dates the introduction of `Promise`s, as a generally available standard JS feature, by a number of years. Hence it now longer seems necessary, as far as I can tell, to include this "special" page in the documentation.

Also, while unrelated to the rest of the patch, updates the `test/`-folder description in the documentation.
2022-02-26 23:53:11 +01:00
calixteman
046ff07ee3
Merge pull request #14610 from Snuffleupagus/jpx-resetContextProbabilities
[JPEG 2000] Add support for resetContextProbabilities (bug 1731483)
2022-02-26 18:26:39 +01:00
Jonas Jenwald
99cd24ce3e Remove the isString helper function
The call-sites are replaced by direct `typeof`-checks instead, which removes unnecessary function calls. Note that in the `src/`-folder we already had more `typeof`-cases than `isString`-calls.
2022-02-26 16:33:41 +01:00
Jonas Jenwald
6bd4e0f5af Re-factor the PDFDocument.documentInfo method
This removes the `DocumentInfoValidators` structure, and thus (slightly) simplifies the code overall. With these changes we only have to iterate through, and validate, the actually available Dictionary entries.
2022-02-26 16:33:21 +01:00
Tim van der Meij
f782f5e5bb
Merge pull request #14607 from Snuffleupagus/wrapReason-unreachable
Simplify the `wrapReason` helper function
2022-02-26 15:37:29 +01:00
Tim van der Meij
cf7ce0aa7e
Merge pull request #14600 from Snuffleupagus/getPageIndex-more-validation
[api-minor] Add validation for the  `PDFDocumentProxy.getPageIndex` method
2022-02-26 15:30:00 +01:00
Jeff Muizelaar
9b9609a6d8 [JPEG 2000] Add support for resetContextProbabilities (bug 1731483) 2022-02-26 13:05:23 +01:00
Calixte Denizet
46369e4aa5 Fix some issues with lineWidth < 1 after transform (bug 1753075, bug 1743245, bug 1710019)
- it aims to fix:
   - https://bugzilla.mozilla.org/show_bug.cgi?id=1753075;
   - https://bugzilla.mozilla.org/show_bug.cgi?id=1743245;
   - https://bugzilla.mozilla.org/show_bug.cgi?id=1710019;
   - issue #13211;
   - issue #14521.
 - previously we were trying to adjust lineWidth to have something correct after the current transform is applied but this approach was not correct because finally the pixel is rescaled with the same factors in both directions.
  And sometimes those factors must be different (see bug 1753075).
 - So the idea of this patch is to apply a scale matrix to the current transform just before setting lineWidth and stroking. This scale matrix is computed in order to ensure that after transform, a pixel will have its two thickness greater than 1.
2022-02-25 18:37:34 +01:00
Jonas Jenwald
28fc8248f0 Simplify the wrapReason helper function
All call-sites that use `wrapReason` should be passing a (possibly cloned) `Error` to the helper function, hence we shouldn't need to have a fallback code-path for any other data.
Note that for the `cancel`/`error` methods on Streams, since PR 11115 we've been asserting that the argument is in fact an `Error` as intended.
When calling `wrapReason` from *rejected* Promises, we should also be guaranteed that an `Error` is provided thanks to the ESLint rules `no-throw-literal` and `prefer-promise-reject-errors`.
2022-02-25 18:31:12 +01:00
Jonas Jenwald
172d007598 [api-minor] Add validation for the PDFDocumentProxy.getPageIndex method
Currently we'll happily attempt to send any argument passed to this method over to the worker-thread, without doing any sort of validation.
That could obviously be quite bad, since there's first of all no protection against sending unclonable data. Secondly, it's also possible to pass data that will cause the `Ref.get` call in the worker-thread to fail immediately.

In order to address all of these issues, we'll now properly validate the argument passed to `PDFDocumentProxy.getPageIndex` and when necessary reject already on the main-thread instead.
2022-02-24 12:01:51 +01:00
Jonas Jenwald
2be8036eb7 [api-minor] Reduce duplication in the "gets non-existent page" unit-test 2022-02-24 11:25:21 +01:00
Jonas Jenwald
ec87995050 Ensure that Cmd/Name is only initialized with string arguments
Trying to use a non-string argument in either a `Cmd` or a `Name` is not intended, and would basically be an implementation error. Hence we can add a non-PRODUCTION check to enforce this, similar to the existing one used e.g. in the `Dict.set` method.
2022-02-23 22:39:12 +01:00
Tim van der Meij
2bb96a708c
Merge pull request #14598 from Snuffleupagus/rm-isBool
Re-factor the `Catalog.viewerPreferences` method and remove the `isBool` helper function
2022-02-23 20:36:56 +01:00
Tim van der Meij
409cbfc817
Merge pull request #14597 from Snuffleupagus/Dict-set-validate-key
Ensure that `Dict.set` only accepts string `key`s
2022-02-23 20:31:36 +01:00
Tim van der Meij
1b51e10c9c
Merge pull request #14595 from Snuffleupagus/structuredClone-comment-support
Update the support information for `structuredClone` (PR 14392 follow-up)
2022-02-23 20:27:35 +01:00
Jonas Jenwald
3704283f5b Remove the isBool helper function
The call-sites are replaced by direct `typeof`-checks instead, which removes unnecessary function calls.
2022-02-23 13:31:03 +01:00
Jonas Jenwald
82f1ee1755 Re-factor the Catalog.viewerPreferences method
This removes the `ViewerPreferencesValidators` structure, and thus (slightly) simplifies the code overall. With these changes we only have to iterate through, and validate, the actually available Dictionary entries.
2022-02-23 13:25:56 +01:00
Jonas Jenwald
a2f9031e9a Ensure that Dict.set only accepts string keys
Trying to use a non-string `key` in a `Dict` is not intended, and would basically be an implementation error. Hence we can add a non-PRODUCTION check to enforce this, complementing the existing `value` check added in PR 11672.
2022-02-22 16:35:20 +01:00
Jonas Jenwald
48985bd221 Update the support information for structuredClone (PR 14392 follow-up)
When the `structuredClone` polyfill was added, the support information in Safari was unclear. Given that an actual version *number* is now available, see below, it seems like a good idea to update the comment accordingly.

https://developer.mozilla.org/en-US/docs/Web/API/structuredClone#browser_compatibility
2022-02-22 12:30:54 +01:00
Jonas Jenwald
05edd91bdb Remove the isNum helper function
The call-sites are replaced by direct `typeof`-checks instead, which removes unnecessary function calls. Note that in the `src/`-folder we already had more `typeof`-cases than `isNum`-calls.

These changes were *mostly* done using regular expression search-and-replace, with two exceptions:
 - In `Font._charToGlyph` we no longer unconditionally update the `width`, since that seems completely unnecessary.
 - In `PDFDocument.documentInfo`, when parsing custom entries, we now do the `typeof`-check once.
2022-02-22 11:55:34 +01:00
Jonas Jenwald
b282814e38 Prefer instanceof Name rather than calling isName() with one argument
Unless you actually need to check that something is both a `Name` and also of the *correct* type, using `instanceof Name` directly should be a tiny bit more efficient since it avoids one function call and an unnecessary `undefined` check.

This patch uses ESLint to enforce this, since we obviously still want to keep the `isName` helper function for where it makes sense.
2022-02-21 12:45:00 +01:00
Jonas Jenwald
4df82ad31e Prefer instanceof Dict rather than calling isDict() with one argument
Unless you actually need to check that something is both a `Dict` and also of the *correct* type, using `instanceof Dict` directly should be a tiny bit more efficient since it avoids one function call and an unnecessary `undefined` check.

This patch uses ESLint to enforce this, since we obviously still want to keep the `isDict` helper function for where it makes sense.
2022-02-21 12:44:56 +01:00
Jonas Jenwald
67b658e8d5 Prefer instanceof Cmd rather than calling isCmd() with *one* argument
Unless you actually need to check that something is both a `Cmd` and also of the *correct* type, using `instanceof Cmd` directly should be a tiny bit more efficient since it avoids one function call and an unnecessary `undefined` check.

This patch uses ESLint to enforce this, since we obviously still want to keep the `isCmd` helper function for where it makes sense.
2022-02-21 12:44:51 +01:00
Jonas Jenwald
bad15894fc Improve the JSDocs for the PDFObjects class
Given that we expose `PDFObjects`-instances, via the `commonObjs` and `objs` properties, on the `PDFPageProxy`-instances this ought to help provide slightly better TypeScript definitions.
2022-02-20 13:02:14 +01:00
Jonas Jenwald
f4712bc0ad Simplify the data stored on PDFObjects-instances
The manually tracked `resolved`-property is no longer necessary, since the same information is now directly available on all `PromiseCapability`-instances.
Furthermore, since the `PDFObjects.resolve` method is not documented as accepting e.g. only Object-data, we probably shouldn't resolve the `PromiseCapability` with the `data` and instead only store it on the `PDFObjects`-instance.[1]

---
[1] While Objects are passed by reference in JavaScript, other primitives such as e.g. strings are passed by value and the current implementation *could* thus lead to increased memory usage. Given how we're using `PDFObjects` in the PDF.js code-base none of this should be an issue, but it still cannot hurt to change this.
2022-02-20 12:33:33 +01:00
Jonas Jenwald
beecde3229 Introduce (some) private properties/methods in the PDFObjects class
This ensures that the underlying data cannot be accessed directly, from the outside, since that's definately not intended here.
Note that we expose `PDFObjects`-instances, via the `commonObjs` and `objs` properties, on the `PDFPageProxy`-instances hence these changes really cannot hurt.
2022-02-20 12:23:30 +01:00
Jonas Jenwald
2cb2f633ac Remove the isRef helper function
This helper function is not really needed, since it's just a wrapper around a simple `instanceof` check, and it only adds unnecessary indirection in the code.
2022-02-19 15:33:42 +01:00
Tim van der Meij
df0aa1a9c4
Merge pull request #14575 from Snuffleupagus/rm-isStream
Remove the `isStream` helper function
2022-02-19 14:59:19 +01:00
Jonas Jenwald
05efe3017b Change PixelsPerInch to a class with static properties (issue 14579)
*Please note:* I'm completely fine with this patch being rejected, and the issue instead closed as WONTFIX, since this is unfortunately a case where the TypeScript definitions dictate how we can/cannot write JavaScript code.

Apparently the TypeScript definitions generation converts the existing `PixelsPerInch` code into a `namespace` and simply ignores the getter; please see a7fc0d33a1/types/src/display/display_utils.d.ts (L223-L226)

Initially I tried tagging `PixelsPerInch` as en `@enum`, see https://jsdoc.app/tags-enum.html, however that unfortunately didn't help.
Hence the only good/simple solution, as far as I'm concerned, is to convert `PixelsPerInch` into a class with `static` properties. This patch results in the following diff, for the `gulp types` build target:
```diff
@@ -195,9 +195,10 @@
      */
     static toDateObject(input: string): Date | null;
 }
-export namespace PixelsPerInch {
-    const CSS: number;
-    const PDF: number;
+export class PixelsPerInch {
+    static CSS: number;
+    static PDF: number;
+    static PDF_TO_CSS_UNITS: number;
 }
 declare const RenderingCancelledException_base: any;
 export class RenderingCancelledException extends RenderingCancelledException_base {
```
2022-02-19 09:05:40 +01:00
Jonas Jenwald
530af48b8e
Merge pull request #14569 from brendandahl/smask-state
Fix canvas state getting out of sync from smasks. (bug 1755507)
2022-02-18 19:35:58 +01:00
Brendan Dahl
7def6d12c8 Fix canvas state getting out of sync from smasks. (bug 1755507)
Soft masks can be enabled/disabled at anytime and at different
points in the save/restore stack. This can lead to
the amount of save/restores becoming unbalanced across the
two canvases. Instead of save/restoring on the temporary canvas
change it so we only track state on the main (suspended canvas).

I was also getting an out balance stack from patterns, so I've also
fixed that and added a warning that will at least show up on chrome.
It would be nice to add this so Firefox at some point too.

Fixes #11328, #14297 and bug 1755507
2022-02-17 17:38:32 -08:00
Jonas Jenwald
1a31855977 Remove the isStream helper function
At this point all the various Stream-classes extends an abstract base-class, hence this helper function is no longer necessary and only adds unnecessary indirection in the code.
2022-02-17 13:51:36 +01:00
Jonas Jenwald
fd319e94b3 Add a missing string-check in the _collectJS helper function
Unfortunately I don't have a test-case that breaks without this change, however the `stringToPDFString` helper function will fail if anything other than a string is passed to it.
The changes in this patch thus make this code more-or-less identical to that found in the `Catalog.{_collectJavaScript, parseDestDictionary}` methods.
2022-02-16 13:43:42 +01:00
Calixte Denizet
18e3a98c2b [api-minor] Don't add in the text content the chars which are out-of-page (bug 1755201)
- it aims to fix https://bugzilla.mozilla.org/show_bug.cgi?id=1755201;
- if the glyph position is not within the view then skip it.
2022-02-13 21:07:11 +01:00
Tim van der Meij
c37d785b2a
Merge pull request #14560 from Snuffleupagus/Node-ReadableStream-polyfill
[api-minor] Remove the, in `legacy` builds, bundled `ReadableStream` polyfill
2022-02-13 14:08:22 +01:00
Jonas Jenwald
b89595fd20 [api-minor] Remove the, in legacy builds, bundled ReadableStream polyfill
According to the MDN compatibility data, see https://developer.mozilla.org/en-US/docs/Web/API/ReadableStream#browser_compatibility, all browsers that we support have native `ReadableStream` implementations (since quite some time too).

Hence only Node.js is now lagging behind w.r.t. `ReadableStream` support, and its experimental implementation doesn't really help us given the life-span of the LTS releases (see https://en.wikipedia.org/wiki/Node.js#Releases).
It seems quite unfortunate to bundle a `ReadableStream` polyfill in the `legacy` builds when it's unnecessary in browsers, given its overall size, but fortunately we can avoid that by simply listing `web-streams-polyfill` as a dependency for the `pdfjs-dist` library.
2022-02-13 10:15:58 +01:00
Jonas Jenwald
d642d34500 Remove the UTF-8 fallback, when TextDecoder is missing, from the Content-Disposition parser
Given that `TextDecoder` is now supported by all modern browsers/environments, please see https://developer.mozilla.org/en-US/docs/Web/API/TextDecoder#browser_compatibility, there's no longer any good reason to keep a UTF-8 fallback in the Content-Disposition parser.
2022-02-12 10:30:25 +01:00
Jonas Jenwald
b87a243222 [api-minor] Stop exposing the createObjectURL helper function in the API
With recent changes, specifically PR 14515 *and* the previous patch, the `createObjectURL` helper function is now only used with the SVG back-end.
All other call-sites, throughout the code-base, are now using `URL.createObjectURL(...)` directly and it no longer seems necessary to keep exposing the helper function in the API.
Finally, the `createObjectURL` helper function is moved into the `src/display/svg.js` file to avoid unnecessarily duplicating this code on both the main- and worker-threads.
2022-02-10 12:01:35 +01:00
Brendan Dahl
f8b2a99ddc
Merge pull request #14543 from Snuffleupagus/bug-1753983
Let `Lexer.getNumber` treat a single minus sign as zero (bug 1753983)
2022-02-09 14:06:35 -08:00
Jonas Jenwald
1f0fb270b1 [api-minor] Ensure that the PDFDocumentLoadingTask-promise is rejected when cancelling the PasswordPrompt (bug 1754421)
This is essentially a *continuation* of PR 7926, where we added support for rejecting the current `PDFDocumentLoadingTask`-promise by throwing inside of the `onPassword`-callback.
Hence the naive way to address [bug 1754421](https://bugzilla.mozilla.org/show_bug.cgi?id=1754421) would be to simply throw in the `onPassword`-callback used in the default viewer. However it unfortunately turns out to not work, since the password input/validation is asynchronous, and we thus need another approach.

The simplest solution that I can come up with here, is thus to *extend* the `onPassword`-callback to also reject the current `PDFDocumentLoadingTask`-instance if an `Error` is explicitly passed as the input to the callback function. (This doesn't feel great, but I cannot see a better solution that isn't really complicated.)
2022-02-09 15:09:20 +01:00
Jonas Jenwald
64f3dbeb48 Let Lexer.getNumber treat a single minus sign as zero (bug 1753983)
This appears to be consistent with the behaviour in both Adobe Reader and PDFium (in Google Chrome); this is essentially the same approach as used for a single decimal point in PR 9827.
2022-02-07 17:09:47 +01:00
Jonas Jenwald
03f5f6a421 [api-minor] Update the minimum supported browser versions
Please note that while we "support" some (by now) fairly old browsers, that essentially means that the library (and viewer) will load and that the basic functionality will work as intended.[1]
However, in older browsers, some functionality may not be available and generally we'll ask users to update to a modern browser when bugs (specific to old browsers) are reported.[2]

There's always a question of just how old browsers the PDF.js contributors can realistically support, and here I'm suggesting that we place the cut-off point at approximately *three* years.
With that in mind, this patch updates the *minimum* supported browsers (and environments) as follows:
 - Chrome 73, which was released on 2019-03-12; see https://en.wikipedia.org/wiki/Google_Chrome_version_history
 - Firefox ESR (as before); see https://wiki.mozilla.org/Release_Management/Calendar
 - Safari 12.1, which was released on 2019-03-25; see https://en.wikipedia.org/wiki/Safari_version_history#Safari_12
 - Node.js 12, which was release on 2019-04-23 (and will soon reach EOL); see https://en.wikipedia.org/wiki/Node.js#Releases

---
[1] Assuming a `legacy`-build is being used, of course.

[2] In general it's never a good idea to use an old/outdated browser, since those may contain *known* security vulnerabilities.
2022-02-06 13:06:43 +01:00
Jonas Jenwald
403baa7bba [api-minor] Remove the normalizeWhitespace option in the PDFPageProxy.{getTextContent, streamTextContent} methods (issue 14519, PR 14428 follow-up)
With these changes, we'll now *always* replace all whitespaces with standard spaces (0x20). This behaviour is already, since many years, the default in both the viewer and the browser-tests.
2022-02-03 09:17:22 +01:00
calixteman
7a034706ba
Merge pull request #14510 from calixteman/14502
[api-minor] Annotations - Adjust the font size in text field in considering the total width (bug 1721335)
2022-01-30 15:58:51 +01:00
Calixte Denizet
ae842e1c3a [api-minor] Annotations - Adjust the font size in text field in considering the total width (bug 1721335)
- it aims to fix #14502 and bug 1721335;
 - Acrobat and Pdfium do the same;
 - it'll avoid to have truncated data when printed;
 - change the factor to compute font size in using field height: lineHeight = 1.35*fontSize
  - this is the value used by Acrobat.
 - in order to not have truncated strings on the bottom, add few basic metrics for standard fonts.
2022-01-30 15:53:31 +01:00
Jonas Jenwald
7cc761a8c0 Polyfill structuredClone with core-js (PR 13948 follow-up)
This allows us to remove the manually implemented `structuredClone` polyfill, thus reducing the maintenance burden for the `LoopbackPort` class; refer to https://github.com/zloirock/core-js#structuredclone

*Please note:* While `structuredClone` support landed already in Firefox 94, Google Chrome only added it in version 98 (currently in Beta). However, given that the `LoopbackPort` will only be used together with *fake workers* in browsers this shouldn't be too much of a problem.[1]
For Node.js environments, where *fake workers* are unfortunately necessary, using a `legacy/`-build is already required which thus guarantees that the `structuredClone` polyfill is available.

Also, the patch updates core-js to the latest version since that one includes `structuredClone` improvements; please see https://github.com/zloirock/core-js/releases/tag/v3.20.3

---
[1] Given that we only support browsers with proper worker support, if *fake workers* are being used that essentially indicates a configuration problem/error.
2022-01-27 21:11:42 +01:00
Jonas Jenwald
8f6965b197
Merge pull request #14506 from Snuffleupagus/license_header_2022
Update the year in the `license_header` files
2022-01-27 19:34:56 +01:00
Jonas Jenwald
00bd549e82 Update the year in the license_header files
This also includes a couple of files that are included as-is in the `pdfjs-dist` library.
2022-01-27 19:24:31 +01:00
calixteman
838909f8c1
Merge pull request #14491 from quaoaris/lines-rendered-too-thick
fix for lines (stroke) are rendered too thick  (Bug 1743245)
2022-01-27 18:46:26 +01:00
Calixte Denizet
3a7004ca25 Take into account all rotations before comparing glyph positions
- it aims to fix #14497;
 - previously, only rotations with an angle 0, 90, 180 or 270 were taken into account;
 - so generalize to any angle but keep the fast path for 0, 90, ... because they're likely more common than anything else.
2022-01-26 17:19:00 +01:00
quaoaris
3f77d80f31 fix for lines (stroke) are rendered too thick (Bug 1743245)
This commit fixes Bug 1743245 (Grided PDF file lines rendered too thick) which was created by a fix for  #12868 .
The lineWidth was set to round(1 * this._combinedScaleFactor) when the pixel is drawn as a parallelorgam with a height <1. This fix changes this to floor(1*this._combinedScaleFactor) .

This change shows a visual result comparable to Chrome and Acrobat.
Regarding the last PR 3 statements in canvas.js are affected and will change with this commit (stroke and paintChar).

renaming the reference files to naming comvention
2022-01-25 10:27:30 +01:00
Jonas Jenwald
8836593b9e Add a (global) cache to the getCharUnicodeCategory function
Given that the regular expression has already become more complex (after the initial patch adding it), it seems to me that it probably cannot hurt to add a global cache to reduce unnecessary re-parsing.
Obviously the `Glyph`-instances are being cached *per* font, however in most documents multiple fonts are being used and in practice there's very often a fair amount of overlap between the /ToUnicode-data in different fonts[1].

Consider for example loading and rendering the entire `tracemonkey.pdf` document (from the test-suite), which isn't a particularily large document. In that case the `getCharUnicodeCategory` function is being called a total of `601` times, however there's only `106` *unique* unicode-chars being checked.

*Please note:* In practice I suppose that this won't have a *huge* effect on overall performance, however given the relative simplicity of this patch I figured that it'd not hurt to submit it for review.

---
[1] Consider e.g. how there's usually different fonts used for regular, bold, respectively italic text.
2022-01-25 09:59:34 +01:00
Calixte Denizet
e1d3a3b414 Remove the invisible format marks from the text chunks
- it aims to fix issue #9186.
2022-01-24 13:47:24 +01:00
calixteman
88236e1163
Merge pull request #14430 from calixteman/beforeinput
[JS] Use beforeinput event to trigger a keystroke event in the sandbox
2022-01-23 20:42:33 +01:00
Calixte Denizet
6ac296e48e [JS] Use beforeinput event to trigger a keystroke event in the sandbox
- it aims to fix issue #14307;
 - this event has been added recently in Firefox and we can now use it;
 - fix few bugs in aform.js or in annotation_layer.js;
 - add some integration tests to test keystroke events (see `AFSpecial_Keystroke`);
 - make dispatchEvent in the quickjs sandbox async.
2022-01-23 19:53:01 +01:00
Tim van der Meij
23b6fde9fc
Merge pull request #14464 from Snuffleupagus/issue-14462
Support Type1 font files with incomplete /CharStrings definitions (issue 14462)
2022-01-19 20:38:46 +01:00
calixteman
b0231cc887
Merge pull request #14456 from calixteman/1749563
Font renderer - get int8 instead of uint8 in composite glyphes (bug 1749563)
2022-01-19 01:20:49 -08:00
Calixte Denizet
74f25d2755 Font renderer - get int8 instead of uint8 in composite glyphes (bug 1749563)
- it aims to fix https://bugzilla.mozilla.org/show_bug.cgi?id=1749563;
 - use some helper functions to get (u|i)int** values in buffer: it helps to have a clearer code;
 - in composite glyphes the translations values with a transformations are signed so consequently get some int8 instead of uint8;
 - add few TODOs.
2022-01-18 22:06:23 +01:00
Jonas Jenwald
a13ae5d97d Support Type1 font files with incomplete /CharStrings definitions (issue 14462)
Please refer to https://www.pdfa.org/norm-refs/Type1Fonts.pdf#page=15 for the expected format for the /CharStrings entries.
In the referenced PDF document the /CharStrings are missing the expected end-token, which causes us to swallow the start of the next glyph name.
2022-01-17 18:55:22 +01:00
Jonas Jenwald
ba37d600d7 Make the normalizeWhitespace handling, in the PartialEvaluator, more efficient (PR 14428 follow-up)
After the changes in PR 14428 we can *directly*, and more efficiently, handle whitespace conversion in `PartialEvaluator.getTextContent` when the `normalizeWhitespace` option is being used.
This way we no longer need a separate helper function for this, and can avoid having to (again) iterate through the text and checking each character. Finally, this also removes the need for using a regular expression on e.g. all non-ASCII text.
2022-01-16 08:29:21 +01:00
calixteman
da953f4b64
Merge pull request #14428 from calixteman/typo
Use the correct dimension to know if we have to add an EOL in vertical mode
2022-01-15 12:47:10 -08:00
Calixte Denizet
9dae421a0d Handle all the whitespaces the same way when creating text chunks 2022-01-15 21:44:00 +01:00
Tim van der Meij
922dac035c
Merge pull request #14448 from Snuffleupagus/Type3-circular-refs
Prevent circular references in Type3 fonts
2022-01-15 14:11:47 +01:00
Tim van der Meij
a72d188599
Merge pull request #14439 from Snuffleupagus/issue-14438
Ignore Annotations with empty /Rect-entries in the display-layer (issue 14438)
2022-01-15 14:11:25 +01:00