Commit Graph

2711 Commits

Author SHA1 Message Date
Calixte Denizet
5dc7f4ade8 XFA - CDATA can be xml so parse it when required 2021-06-07 10:38:39 +02:00
Calixte Denizet
112645ea3d XFA - Don't bind a form node with an empty value when the data node doesn't exist 2021-06-06 17:59:01 +02:00
Jonas Jenwald
20770cb06a Improve text-selection for Type3 fonts with empty /FontBBox-entries (issue 6605)
For Type3 fonts where the /CharProcs-streams of the individual glyph starts with a `d1` operator, we can use that to build a fallback bounding box for the font and thus improve text-selection in some cases.
2021-06-05 08:09:29 +02:00
Brendan Dahl
6255c2a8f3
Merge pull request #13376 from calixteman/6132
Replace command with not enough args by an endchar in CFF font
2021-06-04 14:00:51 -07:00
Calixte Denizet
11573ddd16 XFA - Implement usehref support
- attribute 'use' was already implemented but not usehref
  - in general, usehref should make reference to current document
  - add support for SOM expressions in use and usehref to search a node.
  - get prototype for all nodes if any.
2021-06-04 14:57:05 +02:00
Jonas Jenwald
af78ba64bd Don't change options of the globally used PartialEvaluator in the "should render checkbox with fallback font for printing" unit-test
Given that the same `PartialEvaluator`-instance is used for a lot of these unit-tests, manually changing the options in any one test-case could lead to intermittently failing unit-tests since they're run in a random order.
To fix this, we simply have to use the existing method to clone the `PartialEvaluator`-instance but with the custom options.
2021-05-31 12:14:58 +02:00
calixteman
8c53bf8647
Merge pull request #13437 from calixteman/xfa_mv_root
XFA - Move the fake HTML representation of XFA from the worker to the main thread
2021-05-31 10:14:15 +02:00
Tim van der Meij
a0ce3cb3b4
Merge pull request #13448 from Snuffleupagus/_setDefaultAppearance-alpha
Support strokeAlpha/fillAlpha when creating a fallback appearance stream (issue 6810)
2021-05-28 23:39:36 +02:00
Jonas Jenwald
707a9e3b02 Work-around for HighlightAnnotations without a top-level /ExtGState-entry (issue 13242)
For HighlightAnnotations with a built-in appearance stream, we still rely on it to specify the opacity correctly via a suitable blend mode. However, if the Annotation-drawing operators are placed *within* a /XObject of the /Form-type, the /ExtGState won't apply to the final rendering and the result is that the highlighting obscures the underlying text.

The more *correct* and general solution would likely be to somehow modify the implementation in `src/display/canvas.js`, to special-case handling of /Form-type /XObjects when rendering Annotations. Since we can very easily work-around this problem for now by using the "no appearance stream" code-path, doing *something* here ought to be preferable.

This patch is (obviously) merely a work-around, but given that the referenced issue is (as far as I know) the first case we've seen of this problem a simple solution will hopefully suffice for now.
2021-05-28 13:49:27 +02:00
Jonas Jenwald
a6447f2ca2 Support strokeAlpha/fillAlpha when creating a fallback appearance stream (issue 6810)
This fixes the colours, by respecting the strokeAlpha/fillAlpha-values, for a couple of Annotations in the PDF document from issue 13447.[1]

---
[1] Some of the annotations still won't render at all, when compared with Adobe Reader, but that could/should probably be handled separately.
2021-05-27 16:23:18 +02:00
Calixte Denizet
45c3f00a27 XFA - Move the fake HTML representation of XFA from the worker to the main thread
- the only goal of this patch is to be able to get synchronously the fake html when printing from firefox:
    - in order to print we need to inject some html in beforeprint callback but we cannot block in waiting for all the pages.
  - from a memory point of view: it doesn't change anything since the fake HTML is deleted in the worker;
  - this way we don't break any assumptions.
2021-05-25 19:33:07 +02:00
Calixte Denizet
7cebdbd58c XFA - Fix lot of layout issues
- I thought it was possible to rely on browser layout engine to handle layout stuff but it isn't possible
    - mainly because when a contentArea overflows, we must continue to layout in the next contentArea
    - when no more contentArea is available then we must go to the next page...
    - we must handle breakBefore and breakAfter which allows to "break" the layout to go to the next container
  - Sometimes some containers don't provide their dimensions so we must compute them in order to know where to put
    them in their parents but to compute those dimensions we need to layout the container itself...
  - See top of file layout.js for more explanations about layout.
  - fix few bugs in other places I met during my work on layout.
2021-05-25 17:51:36 +02:00
Tim van der Meij
99430225b0
Drop obsolete logic from the downloadFile function in test/downloadutils.js
This code is old and predates the improvements we made to the test
manifest to only contain working URLs (either Web Archive or
GitHub/Bugzilla links), so the fallback logic to try the Web Archive is
no longer necessary. This greatly simplifies the function and also
makes sure that we fail directly in case a bad URL is added to the
manifest, instead of having it work "accidentally" because of this
logic, since we want the manifest to be correct at all times (and
otherwise fail loudly).
2021-05-22 14:45:42 +02:00
Tim van der Meij
d1d9b9043d
Merge pull request #13415 from Snuffleupagus/getDestination-out-of-order
Improve handling of named destinations in out-of-order NameTrees (PR 10274 follow-up)
2021-05-21 20:15:09 +02:00
Jonas Jenwald
8d5689387b Improve handling of named destinations in out-of-order NameTrees (PR 10274 follow-up)
According to the specification, see https://web.archive.org/web/20210404042322if_/https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G6.2384179, the keys of a NameTree/NumberTree should be ordered.
For corrupt PDF files, which violate this assumption, it's thus possible that trying to lookup a single entry fails.

Previously, in PR 10274, we implemented a fallback that only applies to the "bottom" node of a NameTree/NumberTree, which in general might not actually help for sufficiently corrupt NameTree/NumberTree data.
Instead we remove the current *limited* fallback from `NameOrNumberTree.get`, and defer to the call-site to handle this case explicitly e.g. by using `NameOrNumberTree.getAll` for data where that makes sense. For well-formed documents, these changes should *not* lead to any additional data fetching/parsing.

Finally, as part of these changes, the validation of named destination data is improved in the `Catalog` and a new unit-test is also added.
2021-05-21 15:48:37 +02:00
Jonas Jenwald
1a8d05fdcf Remove some, with Prettier 2.3.0, unnecessary // prettier-ignore comments
To get the maximum benefit from something like Prettier, you obviously don't want to disable the automatic formatting unless absolutely necessary. When we added Prettier there were a number of cases, mostly involving larger Arrays, which required disabling of the automatic formatting for overall readability and/or to not break inline comments.

With changes in Prettier version `2.3.0`, see [the release notes](https://prettier.io/blog/2021/05/09/2.3.0.html#concise-formatting-of-number-only-arrays-10106httpsgithubcomprettierprettierpull10106-10160httpsgithubcomprettierprettierpull10160-by-thorn0httpsgithubcomthorn0), there's now better formatting support for Arrays containing only numbers. Hence we can now remove a number of `// prettier-ignore` comments, and thus get the benefit of automatic formatting in (slightly) more of the code-base.
2021-05-19 11:36:03 +02:00
Calixte Denizet
4544ebf38a Handle PI with no value in xml parser
- an XML PI contains a target and optionally some content (see https://en.wikipedia.org/wiki/Processing_Instruction)
  - the parser expected to always have some content and so it could lead to wrong parsing.
2021-05-18 10:22:18 +02:00
Brendan Dahl
17e9cfcd2a
Merge pull request #13328 from calixteman/js_display1
JS - Add support for display property
2021-05-17 08:47:13 -07:00
Jonas Jenwald
8943bcd3c3 Account for formatting changes in Prettier version 2.3.0
With the exception of one tweaked `eslint-disable` comment, in `web/generic_scripting.js`, this patch was generated automatically using `gulp lint --fix`.

Please find additional information at:
 - https://github.com/prettier/prettier/releases/tag/2.3.0
 - https://prettier.io/blog/2021/05/09/2.3.0.html
2021-05-16 11:44:05 +02:00
Calixte Denizet
1a2cea21a5 Replace command with not enough args by an endchar in CFF font
- Right now, a glyph with an erroneous outline is replaced by an empty glyph
    if the error is far enough from the start there's likely something to render
    so the idea is to replace a command with args by an endchar when no args are
    on the stack: this way OTS is likely happy (no remaining args on stack) and we
    can draw something which is likely better than nothing.
2021-05-14 13:45:45 +02:00
Brendan Dahl
53991d0924 Fix tiling pattern with smask.
After drawing a tiling pattern we were not calling
endDrawing, which handles compositing any
active smasks.

Fixes #8565.
2021-05-12 11:42:08 -07:00
Tim van der Meij
ba99e54c66
Merge pull request #13361 from brendandahl/patterns-fixes
Fix several issues with radial/axial shadings and tiling patterns.
2021-05-12 20:27:37 +02:00
Jonas Jenwald
757636d519 Convert the remaining functions in src/core/primitives.js to use standard classes
This patch was tested using the PDF file from issue 2618, i.e. https://bug570667.bugzilla-attachments.gnome.org/attachment.cgi?id=226471, with the following manifest file:
```
[
    {  "id": "issue2618",
       "file": "../web/pdfs/issue2618.pdf",
       "md5": "",
       "rounds": 50,
       "type": "eq"
    }
]
```

which gave the following results when comparing this patch against the `master` branch:
```
-- Grouped By browser, stat --
browser | stat         | Count | Baseline(ms) | Current(ms) | +/- |   %  | Result(P<.05)
------- | ------------ | ----- | ------------ | ----------- | --- | ---- | -------------
firefox | Overall      |    50 |         3417 |        3426 |   9 | 0.27 |
firefox | Page Request |    50 |            1 |           1 |   0 | 5.41 |
firefox | Rendering    |    50 |         3416 |        3426 |   9 | 0.27 |
```

Based on these results, there's no significant performance regression from using standard classes and this patch should thus be OK.
2021-05-12 09:36:28 +02:00
Brendan Dahl
ac44afa70e Fix several issues with radial/axial shadings and tiling patterns.
Previously, we set the base transformation and pattern matrix
directly to the main rendering ctx of the page, however doing this
caused the current transform to be lost. This would cause issues
with things like shear missing so the pattern was misaligned or when
stroke was used the scale of the line width or dash would be wrong.
Instead we should leave the current transform and use setTransfrom
on the pattern so it is applied correctly. For axial and radial shadings I had
to create a temporary canvas to draw the shading so I could in turn
use setTransform.

Fixes: #13325, #6769, #7847, #11018, #11597, #11473

The following already in the corpus are improved:
issue8078-page1
issue1877-page1
2021-05-11 16:32:24 -07:00
Calixte Denizet
38503d1c5f Fix some integration tests 2021-05-08 16:27:45 +02:00
Jonas Jenwald
fc59a5f709 Take the W array into account when computing the hash, in PartialEvaluator.preEvaluateFont, for composite fonts (issue 13343)
Without this some *composite* fonts may incorrectly end up with matching `hash`es, thus breaking rendering since we'll not actually try to load/parse some of the fonts.

*Please note:* Given that the document, in the referenced issue, doesn't embed *any* of its fonts there's no guarantee that it renders correctly in all configurations even with this patch.
2021-05-07 21:22:36 +02:00
Calixte Denizet
af125cd299 JS - Add support for display property
- in annotation_layer, move common properties treatment in a common method instead having duplicated code in each widget.
2021-05-06 11:15:38 +02:00
Tim van der Meij
afb8c4fd25
Merge pull request #13327 from Snuffleupagus/split-fonts
Split the functionality in `src/core/fonts.js` into multiple files, and use standard classes
2021-05-05 20:16:24 +02:00
Calixte Denizet
451091b89b Fix integration test in the windows bot 2021-05-05 19:05:08 +02:00
Calixte Denizet
3f29892d63 [JS] Fix several issues found in pdf in #13269
- app.alert and few other function can use an object as parameter ({cMsg: ...});
  - support app.alert with a question and a yes/no answer;
  - update field siblings when one is changed in an action;
  - stop calculation if calculate is set to false in the middle of calculations;
  - get a boolean for checkboxes when they've been set through annotationStorage instead of a string.
2021-05-04 19:21:51 +02:00
Calixte Denizet
549aae6c3d JS -- add support for page property in field 2021-05-03 15:46:29 +02:00
Jonas Jenwald
77b258440b Move some constants and helper functions from src/core/fonts.js and into their own file
- `FontFlags`, is used in both `src/core/fonts.js` and `src/core/evaluator.js`.
 - `getFontType`, same as the above.
 - `MacStandardGlyphOrdering`, is a fairly large data-structure and `src/core/fonts.js` is already a *very* large file.
 - `recoverGlyphName`, a dependency of `type1FontGlyphMapping`; please see below.
 - `SEAC_ANALYSIS_ENABLED`, is used by both `Type1Font`, `CFFFont`, and unit-tests; please see below.
 - `type1FontGlyphMapping`, is used by both `Type1Font` and `CFFFont` which a later patch will move to their own files.
2021-05-02 21:00:29 +02:00
Jonas Jenwald
6912bb5e0a Move the IdentityToUnicodeMap/ToUnicodeMap from src/core/fonts.js and into its own file 2021-05-02 21:00:29 +02:00
Jonas Jenwald
883ce5d120 Fix highlighting of search results when the textLayer contains br-elements (PR 13257 follow-up, issue 13323)
Apparently we need to layout `br`-elements in the same *exact* way as the regular `span`-elements which contain the text-content.
2021-05-02 15:36:01 +02:00
Tim van der Meij
f6f335173d
Merge pull request #13303 from Snuffleupagus/BaseStream
Add an abstract base-class, which all the various Stream implementations inherit from
2021-05-01 19:13:36 +02:00
calixteman
af4dc55019
[api-minor] Fix the way to chunk the strings (#13257)
- Improve chunking in order to fix some bugs where the spaces aren't here:
    * track the last position where a glyph has been drawn;
    * when a new glyph (first glyph in a chunk) is added then compare its position with the last saved one and add a space or break:
      - there are multiple ways to move the glyphs and to avoid to have to deal with all the different possibilities it's a way easier to just compare positions;
      - and so there is now one function (i.e. "compareWithLastPosition") where all the job is done.
  - Add some breaks in order to get lines;
  - Remove the multiple whites spaces:
    * some spaces were filled with several whites spaces and so it makes harder to find some sequences of words using the search tool;
    * other pdf readers replace spaces by one white space.

Update src/core/evaluator.js

Co-authored-by: Jonas Jenwald <jonas.jenwald@gmail.com>

Co-authored-by: Jonas Jenwald <jonas.jenwald@gmail.com>
2021-04-30 14:41:13 +02:00
Jonas Jenwald
66d9d83dcb Move the PredictorStream from src/core/stream.js and into its own file 2021-04-28 10:16:51 +02:00
Brendan Dahl
d10da907da
Fix position of highlighted all text. (#13306)
Adds a new integration test to ensure we don't
regress this again.
2021-04-28 10:15:31 +02:00
Tim van der Meij
60ab15427f
Implement rendering polyline/polygon annotations without appearance stream 2021-04-27 19:02:20 +02:00
Jonas Jenwald
6f4394fcd8
Support InkAnnotations without appearance streams (issue 13298) (#13301)
For now, we keep things purposely simple by using straight lines (rather than curves); please see https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G11.2096579
2021-04-27 11:49:03 +02:00
Tim van der Meij
da0e7ea969
Merge pull request #13272 from calixteman/issue13271
Update all the text widgets having the same name with the same value
2021-04-23 21:08:54 +02:00
Jonas Jenwald
57a1ea840f
Ensure that saveDocument works if there's no /ID-entry in the PDF document (issue 13279) (#13280)
First of all, while it should be very unlikely that the /ID-entry is an *indirect* object, note how we're using `Dict.get` when parsing it e.g. in `PDFDocument.fingerprint`. Hence we definitely should be consistent here, since if the /ID-entry is an *indirect* object the existing code in `src/core/writer.js` would already fail.
Secondly, to fix the referenced issue, we also need to check that the /ID-entry actually is an Array before attempting to access its contents in `src/core/writer.js`.

*Drive-by change:* In the `xrefInfo` object passed to the `incrementalUpdate` function, re-name the `encrypt` property to `encryptRef` since its data is fetched using `Dict.getRaw` (given the names of the other properties fetched similarly).
2021-04-22 12:08:56 +02:00
Jonas Jenwald
7b8d2495ca Convert the font-test ttx helper function to use the Fetch API
By replacing `XMLHttpRequest` with a `fetch` call, the helper function can be modernized to use async/await instead.
Note that the headers doesn't seem necessary to set now, since:
 - The Fetch API provides a method for accessing the response as *text*, which renders the "Content-type" header unnecessary.
 - According to https://developer.mozilla.org/en-US/docs/Glossary/Forbidden_header_name, the "Content-length" header isn't necessary.
2021-04-20 23:44:15 +02:00
Calixte Denizet
e868ab0051 Update all the text widgets having the same name with the same value 2021-04-20 20:03:19 +02:00
Jonas Jenwald
3d55b2b10e Replace done callbacks in the font-tests with async/await instead 2021-04-19 13:26:39 +02:00
Tim van der Meij
d42f3d0bfe
Convert done callbacks to async/await in test/unit/evaluator_spec.js 2021-04-18 14:20:54 +02:00
Tim van der Meij
f4237d3a09
Convert done callbacks to async/await in test/unit/annotation_spec.js 2021-04-17 19:59:18 +02:00
Tim van der Meij
c2f3a71eca
Convert done callbacks to async/await in test/unit/api_spec.js 2021-04-17 17:52:23 +02:00
Jonas Jenwald
f560fe6875 A couple of small scripting/XFA-related tweaks in the worker-code
- Use `PDFManager.ensureDoc`, rather than `PDFManager.ensure`, in a couple of spots in the code. If there exists a short-hand format, we should obviously use it whenever possible.

 - Fix a unit-test helper, to account for the previous changes. (Also, converts a function to be `async` instead.)

 - Add one more exists-check in `PDFDocument.loadXfaFonts`, which I missed to suggest in PR 13146, to prevent any possible errors if the method is ever called in a situation where it shouldn't be.
   Also, print a warning if the actual font-loading fails since that could help future debugging. (Finally, reduce overall indentation in the loop.)

 - Slightly unrelated, but make a small tweak of a comment in `src/core/fonts.js` to reduce possible confusion.
2021-04-17 10:34:22 +02:00
Brendan Dahl
ac3fa1e3d7
Merge pull request #13146 from calixteman/xfa_fonts
XFA -- Load fonts permanently from the pdf
2021-04-16 12:55:12 -07:00
Tim van der Meij
6e8ff2fed9
Merge pull request #13247 from Snuffleupagus/update-yargs
Update the `yargs` package to the latest version
2021-04-16 20:33:45 +02:00
Tim van der Meij
cba6a3f375
Merge pull request #13246 from timvandermeij/unit-test-async-await-pt2
Convert done callbacks to async/await in more unit test files
2021-04-16 20:24:53 +02:00
Jonas Jenwald
c988712bc5 Update the yargs package to the latest version
While I wasn't able to figure out *exactly* why the old format didn't work, re-factoring the `parseOptions` function to use `yargs` differently "just worked" so that's hopefully good enough here.
With these changes everything related to a *particular* option now appears in one place, rather than being spread out, which aids readability in my opinion. Also, the options are now sorted alphabetically, to make it easier to find a particular one.

https://www.npmjs.com/package/yargs
2021-04-16 12:04:35 +02:00
Calixte Denizet
7e9579045f XFA -- Load fonts permanently from the pdf
- Different fonts can be used in xfa and some of them are embedded in the pdf.
  - Load all the fonts in window.document.

Update src/core/document.js

Co-authored-by: Jonas Jenwald <jonas.jenwald@gmail.com>

Update src/core/worker.js

Co-authored-by: Jonas Jenwald <jonas.jenwald@gmail.com>
2021-04-15 17:57:42 +02:00
Tim van der Meij
38ed655562
Convert done callbacks to async/await in test/unit/cmap_spec.js 2021-04-14 22:24:28 +02:00
Tim van der Meij
046467ff47
Drop obsolete done callbacks in test/unit/annotation_storage_spec.js
There is no asynchronous code involved here, so we can get rid of all
done callbacks here and simply use the fact that if the function call
ends without failed assertion that the test passed.
2021-04-14 22:11:45 +02:00
Tim van der Meij
82bdba78fb
Drop obsolete done callbacks in test/unit/crypto_spec.js
There is no asynchronous code involved here, so we can get rid of all
done callbacks here and simply use the fact that if the function call
ends without failed assertion that the test passed.
2021-04-14 22:09:17 +02:00
Tim van der Meij
43eb4302ff
Convert done callbacks to async/await in test/unit/message_handler_spec.js 2021-04-14 21:59:13 +02:00
Tim van der Meij
bc8c0bbbfd
Convert done callbacks to async/await in test/unit/display_svg_spec.js 2021-04-14 21:59:13 +02:00
Tim van der Meij
ae48d07582
Merge pull request #13243 from janpe2/ocg-ve
Implement visibility expressions for optional content
2021-04-14 20:42:49 +02:00
Tim van der Meij
cd2c4e277c
Merge pull request #13222 from timvandermeij/unit-test-async
Convert done callbacks to async/await in the smaller unit test files
2021-04-14 20:37:17 +02:00
Jani Pehkonen
3a96977ea8 Implement visibility expressions for optional content 2021-04-14 17:39:41 +03:00
Tim van der Meij
c1e9f6025f
Convert done callbacks to async/await in test/unit/custom_spec.js 2021-04-13 21:51:27 +02:00
Tim van der Meij
a1c1e1b9f8
Convert done callbacks to async/await in test/unit/fetch_stream_spec.js 2021-04-13 21:51:27 +02:00
Tim van der Meij
5607484402
Convert done callbacks to async/await in test/unit/network_spec.js 2021-04-13 21:51:26 +02:00
Tim van der Meij
fcf4d02fca
Convert done callbacks to async/await in test/unit/node_stream_spec.js 2021-04-13 21:51:26 +02:00
Tim van der Meij
99dc0d6b65
Convert done callbacks to async/await in test/unit/primitives_spec.js 2021-04-13 21:50:13 +02:00
Tim van der Meij
a56ffb92be
Convert done callbacks to async/await in test/unit/ui_utils_spec.js 2021-04-13 21:50:13 +02:00
Tim van der Meij
a2811e925d
Convert done callbacks to async/await in test/unit/util_spec.js 2021-04-13 21:47:53 +02:00
Jonas Jenwald
2b2234fd5a [api-minor] Ensure that PDFDocumentProxy.hasJSActions won't fail if MissingDataExceptions are thrown during the associated worker-thread parsing
With the current implementation of `PDFDocument.hasJSActions`, in the worker-thread, we're not actually handling not-yet-loaded data correctly. This can thus fail in *two* different ways:
 - The `PDFDocument.fieldObjects` getter (and its helper method), while it may *return* a Promise, still fetches all of its data synchronously and it can thus throw a `MissingDataException` during parsing.
 - The `Catalog.jsActions` getter, which is completely synchronous, can obviously throw a `MissingDataException` during parsing.

If either of these cases occur currently, the `PDFDocumentProxy.hasJSActions` method in the API can either return a *rejected* Promise (which it never should) or possibly "hang" and never resolve.

*Please note:* While I've not *yet* seen this error in an actual PDF document, it can happen during loading if you're unlucky enough with e.g. the structure of the PDF document and/or the download speed offered by the server.
This patch is thus based on code-inspection *and* on manually throwing a `MissingDataException` on the first access of `Catalog.jsActions` to simulate this situation.

Finally, this patch adds a couple of *API* unit-tests for this (since none existed).
2021-04-13 14:33:56 +02:00
Calixte Denizet
a4c986515f XFA -- Display text content
- display xhtml;
  - allow spaces in xhtml (xfa-spacerun:yes);
  - support column layout;
  - fix some border issues.
2021-04-12 14:13:49 +02:00
Jonas Jenwald
5adee0cdd1 [api-minor] Let PDFPageProxy.getStructTree return null, rather than an empty structTree, for documents without any accessibility data (PR 13171 follow-up)
This is first of all consistent with existing API-methods, where we return `null` when the data in question doesn't exist. Secondly, it should also be (slightly) more efficient since there's less dummy-data that we need to transfer between threads.
Finally, this prevents us from adding an empty/unnecessary span to *every* single page even in documents without any structure tree data.
2021-04-11 12:35:33 +02:00
Tim van der Meij
10574a0f8a
Remove obsolete done callbacks from the unit tests
The done callbacks are an outdated mechanism to signal Jasmine that a
unit test is done, mostly in cases where a unit test needed to wait for
an asynchronous operation to complete before doing its assertions.
Nowadays a much better mechanism is in place for that, namely simply
passing an asynchronous function to Jasmine, so we don't need callbacks
anymore (which require more code and may be more difficult to reason
about).

In these particular cases though the done callbacks never had any real
use since nothing asynchronous happens in these places. Synchronous
functions don't need to use done callbacks since Jasmine simply knows
it's done when the function reaches its normal end, so we can safely get
rid of these callbacks. The telltale sign is if the done callback is
used unconditionally at the end of the function.

This is all done in an effort to over time get rid of all callbacks in
the unit test code.
2021-04-10 20:29:39 +02:00
Tim van der Meij
d9d626a5e1
Merge pull request #13214 from calixteman/signatures
Display widget signature
2021-04-10 19:35:16 +02:00
Calixte Denizet
5875ebb1ca Display widget signature
- but don't validate them for now;
  - Firefox will display a bar to warn that the signature validation is not supported (see https://bugzilla.mozilla.org/show_bug.cgi?id=854315)
  - almost all (all ?) pdf readers display signatures;
  - validation is done in edge but for now it's behind a pref.
2021-04-10 19:13:28 +02:00
Tim van der Meij
03c8c89002
Merge pull request #13171 from brendandahl/struct-tree
[api-minor] Add support for basic structure tree for accessibility.
2021-04-09 21:32:44 +02:00
Brendan Dahl
fc9501a637 Add support for basic structure tree for accessibility.
When a PDF is "marked" we now generate a separate DOM that represents
the structure tree from the PDF.  This DOM is inserted into the <canvas>
element and allows screen readers to walk the tree and have more
information about headings, images, links, etc. To link the structure
tree DOM (which is empty) to the text layer aria-owns is used. This
required modifying the text layer creation so that marked items are
now tracked.
2021-04-09 09:56:28 -07:00
Jonas Jenwald
72ef183085 [api-minor] Remove the manual passing of an AnnotationStorage-instance when calling various API-method
Note how we purposely don't expose the `AnnotationStorage`-class directly in the official API (see `src/pdf.js`), since trying to use *multiple* ones simultaneously doesn't really make sense (e.g. in the viewer).
Instead we lazily initialize, and cache, just *one* instance via `PDFDocumentProxy.annotationStorage` which should thus be available internally in the API itself without having to be manually passed to various methods.

To support these changes, the `AnnotationStorage`-instance initialization is moved into the `WorkerTransport`-class to allow both `PDFDocumentProxy` and `PDFPageProxy` to access it.
This patch implements the following simplifications:
 - Remove the `annotationStorage`-parameter from `PDFDocumentProxy.saveDocument`, since it's already available internally.
   Furthermore, while it's currently possible to call that method without an `AnnotationStorage`-instance, that really does *not* make any sense at all. In this case you're effectively reducing `PDFDocumentProxy.saveDocument` to a "regular" `PDFDocumentProxy.getData` call, but with *a lot* more overhead, which was obviously not the intention of the `PDFDocumentProxy.saveDocument`-method.

 - Try to discourage third-party users from calling `PDFDocumentProxy.saveDocument` unconditionally, as a replacement for `PDFDocumentProxy.getData` (note the previous point).

 - Replace the `annotationStorage`-parameter, in `PDFPageProxy.render`, with a boolean `includeAnnotationStorage`-parameter which simply indicates if the (internally available) `AnnotationStorage`-instance should be used during rendering (e.g. for printing).

 - By removing the need to *manually* provide `annotationStorage`-parameters to various API-methods, using the API should become simpler (e.g. for third-parties) since you no longer need to worry about manually fetching and passing around this data.
2021-04-09 13:24:25 +02:00
Jonas Jenwald
f986ccdf0e Fuzzy-match the fontName, for TrueType Collection fonts, where the "name"-table is wrong (issue 13193)
The fontName, as defined in the PDF document, cannot be found in *any* of the "name"-tables in the TrueType Collection font. To work-around that, this patch adds a *fallback* code-path to allow using an approximately matching fontName rather than outright failing.
2021-04-07 15:25:32 +02:00
Tim van der Meij
228adbf673
Merge pull request #13172 from Snuffleupagus/cleanup-keepFonts
[api-minor] Add an option, in `PDFDocumentProxy.cleanup`, to allow fonts to remain attached to the DOM
2021-04-05 14:21:34 +02:00
Jonas Jenwald
232fbd28e1 Re-factor the PDFDocumentProxy.cleanup unit-tests to use async/await 2021-04-02 12:32:35 +02:00
Jonas Jenwald
0eb1433c78 [api-minor] Change the format of the fontName-property, in defaultAppearanceData, on Annotation-instances (PR 12831 follow-up)
Currently the `fontName`-property contains an actual /Name-instance, which is a problem given that its fallback value is an empty string; see ca7f546828/src/core/default_appearance.js (L35)
The reason that this is a problem can be seen in ca7f546828/src/core/primitives.js (L30-L34), since an empty string short-circuits the cache. Essentially, in PDF documents, a /Name-instance cannot be empty and the way that the `DefaultAppearanceEvaluator` does things is unfortunately not entirely correct.

Hence the `fontName`-property is changed to instead contain a string, rather than a /Name-instance, which simplifies the code overall.

*Please note:* I'm tagging this patch with "[api-minor]", since PR 12831 is included in the current pre-release (although we're not using the `fontName`-property in the display-layer).
2021-04-01 16:47:30 +02:00
Tim van der Meij
5a64157a2f
Merge pull request #13168 from janpe2/ttf-uni-glyphs
Use post table when Encoding has only Differences
2021-03-31 21:35:13 +02:00
Jani Pehkonen
0117ee5071 Use post table when Encoding has only Differences
Fixes #13107
In the issue, some TrueType glyph names have the format `uniXXXX`.
Font's `Encoding` dictionary has the entry `Differences` but no
`BaseEncoding`. `uniXXXX` names are converted to glyph indices
using font's `post` table but currently that is done only when
`BaseEncoding` exists. We must enable the conversion also when only
`Differences` exists.
2021-03-31 17:58:44 +03:00
Jonas Jenwald
db1e1612df [api-minor] Support proper URL-objects, in addition to URL-strings, in getDocument
Currently only URL-strings are officially supported by `getDocument`, however at this point in time I cannot really see any compelling reason to not support `URL`-objects as well.

Most likely the reason that we've don't already support `URL`-objects, in `getDocument`, is that historically `URL` wasn't fully implemented across browsers and our old polyfill wasn't perfect; see https://developer.mozilla.org/en-US/docs/Web/API/URL/URL#browser_compatibility

*Please note:* Because of how the `url` parameter is currently handled, there's actually *some* cases where passing a `URL`-object to `getDocument` already works. That, in my opinion, provides additional motivation for supporting `URL`-objects officially, since it makes the API more consistent.

The following is an attempt to summarize the *current* situation, based on the actual code rather than the JSDocs:
 - `getDocument("url string")` works and is documented.[1]
 - `getDocument({ url: "url string", })` works and is documented.[1]
 - `getDocument(new URL(...))` throws immediately, since no supported parameters are found.
 - `getDocument({ url: new URL(...), })` actually works even though it's not documented.[1] Originally, when data was fetched on the worker-thread, this would likely have thrown since `URL` isn't clonable.[2]
 - `getDocument({ url: { abc: 123, }, })`, or some similarily meaningless input, will be "accepted" by `getDocument` and then throw a `MissingPDFException` when attempting to fetch the bogus data.

With the changes in this patch, not only is `URL`-objects now officially supported and documented when calling `getDocument`, but we'll also do a much better job at actually validating any URL-data passed to `getDocument` (and instead fail early).

---
[1] In *browsers*, we create a valid URL thus indirectly validating the input. In Node.js environments, on the other hand, no validation is done since obtaining a baseUrl is more difficult (and PDF.js is primarily written for browsers anyway).

[2] https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers_API/Structured_clone_algorithm#supported_types
2021-03-31 16:21:41 +02:00
calixteman
b3528868c1
XFA - Add support for few ui elements (#13115)
- input;
  - layout;
  - border;
  - margin;
  - color.
2021-03-31 15:42:21 +02:00
calixteman
84d7cccb1d
JS - Handle correctly hierarchy of fields (#13133)
* JS - Handle correctly hierarchy of fields
  - it aims to fix #13132;
  - annotations can inherit their actions from the parent field;
  - there are some fields which act as a container for other fields:
    - they can be access through js so need to add them with an empty type (nothing in the spec about that but checked in Acrobat);
    - calculation order list (CO) can reference them so need make them through this.getField;
    - getArray method must return kids.
  - field values are number, string, ... depending of their type but nothing in the spec on how to know what's the type:
    - according to the comment for Canonical Format: https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf#page=461
    - it seems that this "type" can be guessed from js action Format (when setting a type in Acrobat DC, the only affected thing is this action).
  - util.scand with an empty string returns the current date.
2021-03-30 08:50:35 -07:00
Jonas Jenwald
75a6b2fa13
Improve handling of *linked* test-cases for the unit/integration suites (#13160)
- Actually support *linked* test-cases in the integration-tests (in the same way as the unit-tests).

 - Add a new `"type": "other"`-kind to the test-manifest, to support *linked* test-cases in the unit/integration-tests without requiring the PDF document in question to also be a reference-test.
2021-03-30 13:24:04 +02:00
Calixte Denizet
9296ee6986 Skip extra objects in object stream in using offsets 2021-03-28 13:03:05 +02:00
calixteman
81c602c61c
Set CFF header to 4 when writing it because it contains 4 elements (#13149) 2021-03-26 18:23:18 +01:00
calixteman
63471bcbbe
XFA - Convert some template properties into CSS ones (#13082)
- implement few positioning properties: position, width, height, anchor;
  - implement font element;
  - implement fill element (used by font) and its children (linear, radial, ...);
  - font property is inherited from ancestor container (see https://www.pdfa.org/wp-content/uploads/2020/07/XFA-3_3.pdf#page=43) so let CSS handles that stuff;
  - in order to reduce the number of properties to set, only set non default properties and put the default in CSS;
  - set a background to some containers to be able to see them (will be removed in a future commit).
2021-03-25 13:02:39 +01:00
Jonas Jenwald
eeda2215d7 Remove redundant done-callback functions from unit-tests which are async
For unit-tests which are asynchronous, using a `done`-callback is redundant and future Jasmine versions will stop supporting that pattern.
2021-03-21 11:33:39 +01:00
Tim van der Meij
8269ddbd16
Merge pull request #13105 from Snuffleupagus/BasePdfManager-parseDocBaseUrl
Improve memory usage around the `BasePdfManager.docBaseUrl` parameter (PR 7689 follow-up)
2021-03-19 23:03:20 +01:00
calixteman
24e598a895
XFA - Add a layer to display XFA forms (#13069)
- add an option to enable XFA rendering if any;
  - for now, let the canvas layer: it could be useful to implement XFAF forms (embedded pdf in xml stream for the background and xfa form for the foreground);
  - ui elements in template DOM are pretty close to their html counterpart so we generate a fake html DOM from template one:
    - it makes easier to translate template properties to html ones;
    - it makes faster the creation of the html element in the main thread.
2021-03-19 10:11:40 +01:00
Jonas Jenwald
bd9dee1544 Move the getPdfFilenameFromUrl helper function from web/ui_utils.js and into src/display/display_utils.js
It seems reasonable to place this alongside the *similar* `getFilenameFromUrl` helper function. This way, with the changes in the next patch, we also avoid having to expose the `isDataScheme` function in the API itself and we instead expose `getPdfFilenameFromUrl` in the API (which feels overall more appropriate).
2021-03-17 15:48:24 +01:00
Jonas Jenwald
5099f1977f Support LineAnnotations with empty /Rect-entries (issue 6564)
This extends PR 13033 slightly, with a heuristic to support corrupt PDF documents where the `LineAnnotation`s have an empty /Rect-entry. Please note that while I have no idea if this is "correct", this patch at least makes us output the same /BBox as re-saving in Adobe Reader does.
2021-03-15 16:33:43 +01:00
Tim van der Meij
cc59c81fe6
Merge pull request #13096 from Snuffleupagus/eslint-no-var-stats
Enable the ESLint `no-var` rule in the `test/stats/` folder
2021-03-14 11:34:06 +01:00
Jonas Jenwald
7ec2bd0f01 Enable the ESLint no-var rule in test/add_test.js
These changes were done automatically, by using the `gulp lint --fix` command.
2021-03-14 10:25:51 +01:00
Jonas Jenwald
473f0aeeb2 Enable the ESLint no-var rule in the test/stats/ folder
Note that the majority of these changes were done automatically, by using the `gulp lint --fix` command, and the manual changes were limited to the following diff:

```diff
diff --git a/test/stats/statcmp.js b/test/stats/statcmp.js
index 7c4dbf1d3..22d535a5a 100644
--- a/test/stats/statcmp.js
+++ b/test/stats/statcmp.js
@@ -1,13 +1,7 @@
 "use strict";

 const fs = require("fs");
-
-try {
-  var ttest = require("ttest");
-} catch (e) {
-  console.log('\nttest is not installed -- to intall, run "npm install ttest"');
-  console.log("Continuing without significance test...\n");
-}
+const ttest = require("ttest");

 const VALID_GROUP_BYS = ["browser", "pdf", "page", "round", "stat"];

@@ -134,9 +128,7 @@ function stat(baseline, current) {
   if (ttest) {
     labels.push("Result(P<.05)");
   }
-  let i,
-    row,
-    rows = [];
+  const rows = [];
   // collect rows and measure column widths
   const width = labels.map(function (s) {
     return s.length;
@@ -146,7 +138,7 @@ function stat(baseline, current) {
     const key = keys[k];
     const baselineMean = mean(baselineGroup[key]);
     const currentMean = mean(currentGroup[key]);
-    row = key.split(",");
+    const row = key.split(",");
     row.push(
       "" + baselineGroup[key].length,
       "" + Math.round(baselineMean),
@@ -165,7 +157,7 @@ function stat(baseline, current) {
         row.push("");
       }
     }
-    for (i = 0; i < row.length; i++) {
+    for (let i = 0; i < row.length; i++) {
       width[i] = Math.max(width[i], row[i].length);
     }
     rows.push(row);
@@ -181,8 +173,8 @@ function stat(baseline, current) {
   console.log("-- Grouped By " + options.groupBy.join(", ") + " --");
   const groupCount = options.groupBy.length;
   for (let r = 0; r < rows.length; r++) {
-    row = rows[r];
-    for (i = 0; i < row.length; i++) {
+    const row = rows[r];
+    for (let i = 0; i < row.length; i++) {
       row[i] = pad(row[i], width[i], i < groupCount ? "right" : "left");
     }
     console.log(row.join(" | "));
@@ -208,5 +200,5 @@ function main() {
   stat(baseline, current);
 }

-var options = parseOptions();
+const options = parseOptions();
 main();
```
2021-03-14 10:15:45 +01:00
Jonas Jenwald
5b5061afa8 Enable the ESLint no-var rule globally
A significant portion of the code-base has now been converted to use `let`/`const`, rather than `var`, hence it should be possible to simply enable the ESLint `no-var` rule globally.
This way we can ensure that new code won't accidentally use `var`, and it also removes the need to manually enable the rule in various folders.

Obviously it makes sense to continue the efforts to replace `var`, but that should probably happen on a file and/or folder basis.

Please note that this patch excludes the following code:
 - The `extensions/` folder, since that seemed easiest for now (and I don't know exactly what the support situation is for the Chromium-extension).

 - The entire `external/` folder is ignored, since most of it's currently excluded from linting.
   For the code that isn't imported from elsewhere (and should be ignored), we should probably (at some point) bring the code up to the same linting/formatting standard as the rest of the code-base.

 - Various files in the `test/` folder are ignored, as necessary, since the way that a lot of this code is loaded will require some care (or perhaps larger re-factoring) when removing `var` usage.
2021-03-13 16:12:53 +01:00
Jonas Jenwald
22e0ed51c6 Remove unnecessary /* eslint no-var: error */ lines in the test/unit/ folder (PR 12528 follow-up)
These lines are no longer needed, since the ESLint `no-var` rule has been enabled in the entire folder.
2021-03-13 11:50:11 +01:00
Jonas Jenwald
39cd844243 Ensure that *all* errors are handled in rasterizeTextLayer/rasterizeAnnotationLayer
Currently errors occurring within the `src/display/{text_layer, annotation_layer}.js` files are not being handled properly by the test-suite, and the tests simply time out rather than failing as intended.
This makes it *very* easy to accidentally overlook a certain type of errors, see e.g. https://github.com/mozilla/pdf.js/pull/13055#discussion_r589005041, which this patch will thus prevent.
2021-03-12 14:05:53 +01:00
Brendan Dahl
47a5550f10 Disable intermittent unit test "creates pdf doc from non-existent URL"
Disable this test so we don't have to manually review unit test
failure log all the time.
2021-03-11 11:40:40 -08:00
Calixte Denizet
3243672727 XFA - Create Form DOM in merging template and data trees
- Spec: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.364.2157&rep=rep1&type=pdf#page=171;
  - support for the 2 ways of merging: consumeData and matchTemplate;
  - create additional nodes in template DOM when occur node allows it;
  - support for global values in data DOM.
2021-03-08 14:10:30 +01:00
Jonas Jenwald
36bb4fa823 Remove the test/features folder, since it's very out of date (issue 11954)
These tests, and their [accompanying Wiki page](https://github.com/mozilla/pdf.js/wiki/Required-Browser-Features), haven't received any real updates for *many years* and are sufficiently out of date to be effectively useless now.
Providing *irrelevant* compatibility information seems overall worse than not providing any information, and as suggested in the issue it'd probably be better to use https://github.com/mozilla/pdf.js#online-demo for checking if a particular platform/browser is supported.

Thanks to version control, it's easy to restore these files should the need ever arise. However, re-introducing these tests would essentially require updating every single test-case *and* a commitment to keeping them up to date with future code changes.
2021-03-07 13:41:30 +01:00
Calixte Denizet
c01ef24541 JS - reset correctly radio buttons 2021-03-07 11:04:40 +01:00
Tim van der Meij
5828ff6cb0
Implement rendering line annotations without appearance stream 2021-02-28 18:57:58 +01:00
Tim van der Meij
fa6cebf045
Implement rendering square/circle annotations without appearance stream 2021-02-27 19:05:12 +01:00
Calixte Denizet
17363bbc6f Fix integration test with js-colors
- need to wait for color change to check its value.
2021-02-26 11:40:02 +01:00
calixteman
45329af926
XFA -- Add support for SOM expressions (#12983)
- specifications: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.364.2157&rep=rep1&type=pdf#page=87;
 - add a parser for SOM expressions;
 - add search functions to resolve those expressions;
 - search functions will be used to bind data into template.
2021-02-24 10:13:02 +01:00
Tim van der Meij
f3aa4408a5
Merge pull request #13005 from calixteman/colors
JS - Fix setting a color on an annotation
2021-02-21 14:50:03 +01:00
Calixte Denizet
4a5f1d1b7a JS - Fix setting a color on an annotation
- strokeColor corresponds to borderColor;
 - support fillColor and textColor;
 - support colors on the different annotations;
 - fix typo in aforms (+test).
2021-02-20 15:24:37 +01:00
Jonas Jenwald
e9038cc3d1 Send the AnnotationStorage-data to the worker-thread as a Map
Rather than converting the `AnnotationStorage`-data to an Object, before sending it to the worker-thread, we should be able to simply send the internal `Map` directly.
The "structured clone algorithm" doesn't have a problem with `Map`s, however the `LoopbackPort` used when workers are *disabled* (e.g. in Node.js environments) didn't use to support them. With PR 12997 having lifted that restriction, we should now be able to simply send the `AnnotationStorage`-data as-is rather than having to iterate through it to first create an Object.

*Please note:* The changes in `src/core/annotation.js` could have been a lot more compact if we were able to use optional chaining in the `src/core` folder. Unfortunately that's still not possible, since SystemJS is being used in the development viewer (i.g. `gulp server`) and fixing that is *still* blocked by [bug 1247687](https://bugzilla.mozilla.org/show_bug.cgi?id=1247687).
2021-02-18 17:13:43 +01:00
calixteman
0fa9976268
XFA - Add support for prototypes (#12979)
- specifications: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.364.2157&rep=rep1&type=pdf#page=225&zoom=auto,-207,784
 - add a clone method on nodes in order to be able to clone a proto;
 - support ids in template namespace;
 - prevent from cycle when applying protos.
2021-02-18 10:32:25 +01:00
Tim van der Meij
4619b1b568
Merge pull request #12997 from Snuffleupagus/metadata-worker
Move the Metadata parsing to the worker-thread
2021-02-17 20:57:46 +01:00
calixteman
b5be515375
XFA - Add a lexer/parser for FormCalc language (#12936)
- the language specifications are: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.364.2157&rep=rep1&type=pdf#page=1049
 - it can be used to:
   * as a scripting language for calculation, validations, ...
   * in SOM expressions to select nodes: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.364.2157&rep=rep1&type=pdf#page=101
2021-02-17 20:28:06 +01:00
Jonas Jenwald
d366bbdf51 Move the encodeToXmlString helper function to src/core/core_utils.js
With the previous patch this function is now *only* accessed on the worker-thread, hence it's no longer necessary to include it in the *built* `pdf.js` file.
2021-02-17 13:12:01 +01:00
Jonas Jenwald
b66f294f64 Move the XML-parser to the src/core/-folder
With the previous patch this functionality is now *only* accessed on the worker-thread, hence it's no longer necessary to include it in the *built* `pdf.js` file.
2021-02-17 13:12:01 +01:00
Jonas Jenwald
cc3a6563ee Move the Metadata parsing to the worker-thread
The only reason, as far as I can tell, for parsing the Metadata on the main-thread is how it was originally implemented. When Metadata support was first implemented, it utilized the [`DOMParser`](https://developer.mozilla.org/en-US/docs/Web/API/DOMParser) which isn't available in workers.
Today, with the custom XML-parser being used, that's no longer an issue and it seems reasonable to move the Metadata parsing to the worker-thread[1], since that's where all parsing should happen (for performance reasons).

Based on these changes, we'll be able to reduce the now unnecessary duplication of the XML-parser (and related code) in both of the *built* `pdf.js`/`pdf.worker.js` files.

Finally, this patch changes the `_repair` method to use "Array + join" rather than string concatenation.

---
[1] This needed the previous patch, to enable sending of `Map`s between threads with workers disabled.
2021-02-17 13:12:01 +01:00
Calixte Denizet
ccef734ebb Remove Promise.all and async+done from unit/scripting_spec 2021-02-17 11:19:39 +01:00
Calixte Denizet
82f75a8ac2 JS -- Fix doc.getField and add missing field methods
- getField("foo") was wrongly returning a field named "foobar";
 - field object had few missing unimplemented methods
2021-02-17 10:42:52 +01:00
Tim van der Meij
bab059d8fd
Merge pull request #12964 from calixteman/12963
Avoid infinite loop when getting annotation field name
2021-02-16 22:36:24 +01:00
Calixte Denizet
0fc8267576 Avoid infinite loop when getting annotation field name
- aims to fix issue #12963;
 - use a Set to track already visited objects;
 - remove the loop limit in getInheritableProperty and use a RefSet too.
2021-02-14 19:58:19 +01:00
Jonas Jenwald
b26c7974fe [api-minor] Change the dc:subject Metadata field to an Array
This patch simply extends the existing handling of the `dc:creator` field, which should hopefully suffice here; please refer to https://wwwimages2.adobe.com/content/dam/acom/en/devnet/xmp/pdfs/XMP%20SDK%20Release%20cc-2016-08/XMPSpecificationPart1.pdf#page=34
2021-02-14 17:16:40 +01:00
Calixte Denizet
ea06bb0e36 [api-minor] Annotation -- Don't compute appearance when nothing has changed
* don't set a value in annotationStorage by default:
   - having an undefined when the annotation is rendered for saving/printing means nothing has changed so use normal appearance
   - aims to fix https://bugzilla.mozilla.org/show_bug.cgi?id=1681687
 * change the way to compute font size when this one is null in DA:
   - make fontSize proportional to line height
   - in multiline case, take into account the number of lines for text entered to adapt the font size
2021-02-12 19:27:21 +01:00
calixteman
a8021208ea
Restore window.alert after use in scripting test (#12987) 2021-02-12 14:19:58 +01:00
dhufnagel
fc925827b2
fix initial state of checkboxes in display layer (#12904)
consider the export value when multiple checkboxes have the same name
2021-02-12 11:22:54 +01:00
Jonas Jenwald
4733f163e8 Replace a few new Date().getTime() instances with Date.now()
The former format is not only more verbose, but it's also *slightly* less efficient since it creates a new `Date` object.
2021-02-11 23:00:42 +01:00
calixteman
0479deef4e
XFA -- Add other objects (#12949)
- connectionSet: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.364.2157&rep=rep1&type=pdf#page=969
 - datasets: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.364.2157&rep=rep1&type=pdf#page=1038
  - signature: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.364.2157&rep=rep1&type=pdf#page=1040
  - stylesheet: the same
  - xhtml: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.364.2157&rep=rep1&type=pdf#page=1187
2021-02-11 12:30:37 +01:00
Jonas Jenwald
0068dba009 [api-minor] Rename -es5 to -legacy, to reduce confusion over what's actually supported (issue 12976)
*Please note that this will also require some edits of the Wiki.*
2021-02-10 16:01:59 +01:00
Jonas Jenwald
d3e65f24e3 Request all data, rather than throwing, when encountering general errors in ObjectLoader._walk (issue 9462, PR 3289 follow-up)
*As far as I can tell, this has been broken ever since PR 3289 (back in 2013) without anyone noticing.*

For any non-`MissingDataException` errors encountered in `ObjectLoader._walk`, we're simply throwing immediately which thus has the potential to *completely* break rendering of an entire page.
In practice this is obviously only an issue for PDF documents which are in one way or another corrupt, since that's the only way that `XRef.fetch` will throw non-`MissingDataException` errors. To make matters worse these errors are *intermittent*, since they can only occur if the document is still loading when the `ObjectLoader`-code runs (note the early return in `ObjectLoader.load`).

Please note that we cannot simply catch the error and let "normal" parsing continue in `ObjectLoader._walk`, since that could lead to errors elsewhere given that resources "below" the current one (in the graph) might not be checked as intended then.
All-in-all, the only way to make absolutely sure that we won't cause *unexpected* `MissingDataException`s somewhere else in the code-base is to fallback to fetching the *entire* document in this edge-case.
2021-02-06 14:33:50 +01:00
Calixte Denizet
652ff57897 XFA -- Add template object
- Specifications: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.364.2157&rep=rep1&type=pdf#page=596
2021-02-03 21:05:10 +01:00
Jonas Jenwald
cacb1cc7ba Re-enable the issue6961 test-case (issue 7112) 2021-02-02 10:31:16 +01:00
Calixte Denizet
0ff5cd7eb5 XFA - Add a parser for XFA files
- the parser is base on a class extending XMLParserBase
 - it handle xml namespaces:
   * each namespace is assocated with a builder
   * builder builds nodes belonging to the namespace
   * when a node is inserted in the parent namespace compatibility is checked (if required)
 - to avoid name collision between xml names and object properties, use Symbol.
2021-02-01 13:45:31 +01:00
Tim van der Meij
286271152f
Merge pull request #12910 from calixteman/bidi
Add back dir property in spans in text layer
2021-01-27 22:09:00 +01:00
Tim van der Meij
639437d287
Merge pull request #12911 from calixteman/reg_test
Fix text layer regression tests in using the correct line-height property
2021-01-26 23:45:45 +01:00
Calixte Denizet
539256c351 Add back dir property in spans in text layer
- aims to fix #12909
2021-01-26 12:00:05 +01:00
calixteman
a3f6882b06
JS -- add support for choice widget (#12826) 2021-01-25 23:40:57 +01:00
Calixte Denizet
52641e8643 Fix text layer regression tests in using the correct line-height property 2021-01-25 23:01:07 +01:00
Tim van der Meij
f2c7338b02
Merge pull request #12897 from calixteman/12895
JS - Fix mouse event names
2021-01-24 12:28:24 +01:00
Calixte Denizet
34d2e72df2 JS - Fix mouse event names
- fix issue #12895
2021-01-23 20:26:22 +01:00
Tim van der Meij
d4c4f5d4e5
Merge pull request #12870 from Snuffleupagus/page-advance
Add previous/next-page functionality that takes scroll/spread-modes into account (issue 11946)
2021-01-23 19:35:08 +01:00
Tim van der Meij
25b84ce84c
Merge pull request #12828 from dhufnagel/feature/annotation_layer_display_fontsize
[api-minor] Set font size and color for text widget annotations
2021-01-23 16:08:07 +01:00
Jonas Jenwald
ef1d33a29e Use slightly less verbose font-names in the "Default appearance" unit-tests
The new names are not only less verbose, but also uses a *very* common PDF font-naming convention.
2021-01-23 15:34:22 +01:00
Jonas Jenwald
6bcb4e3ad9 Ensure that parseDefaultAppearance won't attempt to access a not yet defined variable (PR 12831 follow-up)
Note how, in the `if (this.stateManager.stateStack.length !== 0) {` branch, we're attempting to access the not yet defined variable[1] `args`. If this code-path is ever hit, an Error will be thrown and parsing will thus be aborted immediately (likely leading to e.g. rendering bugs).

Note that I found this purely by accident, since I happened to glance at the LGTM report. However, I've since found that the error is also present during the unit-test[2] and with this patch we're actually testing the *intended* thing here.

As part of fixing this, and to avoid re-introducing a similar bug in the future, we'll now instead always reset `args.length` *before* attempting to read the next operator.
Also, we can use the existing `EvaluatorPreprocessor.savedStatesDepth` getter to simplify the save/restore detection a tiny bit.

---
[1] The ESLint rule `no-use-before-define` would have helped catch this problem, but unfortunately we cannot enable that without quite a bit of refactoring all over the code-base.

[2] The unit-test was updated such that it would fail in the `master`-branch.
2021-01-23 15:33:28 +01:00
Dominik Hufnagel
c5083cda02 set font size and color on annotation layer
use the default appearance to set the font size and color of a text
annotation widget
2021-01-22 23:12:14 +01:00
Jonas Jenwald
a2b592f4a2 Add previous/next-page functionality that takes scroll/spread-modes into account (issue 11946)
- For wrapped scrolling, we unfortunately need to do a fair bit of parsing of the *current* page layout. Compared to e.g. the spread-modes, where we can easily tell how the pages are laid out, with wrapped scrolling we cannot tell without actually checking. In particular documents with varying page sizes require some care, since we need to check all pages on the "row" of the current page are visible and that there aren't any "holes" present. Otherwise, in the general case, there's a risk that we'd skip over pages if we'd simply always advance to the previous/next "row" in wrapped scrolling.

 - For horizontal scrolling, this patch simply maintains the current behaviour of advancing *one* page at a time. The reason for this is to prevent inconsistent behaviour for the next and previous cases, since those cannot be handled identically. For the next-case, it'd obviously be simple to advance to the first not completely visible page. However for the previous-case, we'd only be able to go back *one* page since it's not possible to (easily) determine the page layout of non-visible pages (documents with varying page sizes being a particular issue).

 - For vertical scrolling, this patch maintains the current behaviour by default. When spread-modes are being used, we'll now attempt to advance to the next *spread*, rather than just the next page, whenever possible. To prevent skipping over a page, this two-page advance will only apply when both pages of the current spread are visible (to avoid breaking documents with varying page sizes) and when the second page in the current spread is fully visible *horizontally* (to handle larger zoom values).

In order to reduce the performance impact of these changes, note that the previous/next-functionality will only call `getVisibleElements` for the scroll/spread-modes where that's necessary and that "normal" vertical scrolling is thus unaffected by these changes.

To support these changes, the `getVisibleElements` helper function will now also include the `widthPercent` in addition to the existing `percent` property.
The `PDFViewer._updateHelper` method is changed slightly w.r.t. updating the `currentPageNumber` for the non-vertical/spread modes, i.e. won't affect "normal" vertical scrolling, since that helped simplify the overall calculation of the page advance.

Finally, these new `BaseViewer` methods also allow (some) simplification of previous/next-page functionality in various viewer components.

*Please note:* There's one thing that this patch does not attempt to change, namely disabling of the previous/next toolbarButtons respectively the firstPage/lastPage secondaryToolbarButtons. The reason for this is that doing so would add quite a bit of complexity in general, and if for some reason `BaseViewer._getPageAdvance` would get things wrong we could end up incorrectly disabling the buttons. Hence it seemed overall safer to *not* touch this, and accept that the buttons won't be `disabled` despite in some edge-cases no further scrolling being possible.
2021-01-22 21:38:15 +01:00
Jonas Jenwald
4db7330677 Enable ESLint rules that no longer need to be disabled on a directory/file-basis
Given that browsers/environments without native support for both arrow functions and object shorthand properties are no longer supported in PDF.js, please refer to the compatibility information below, we can now enable a fair number of ESLint rules and also simplify/remove some `.eslintrc` files.

With the exception of the `no-alert` cases, all code changes were made automatically by using `gulp lint --fix`.

 - https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Functions/Arrow_functions#browser_compatibility
 - https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Object_initializer#browser_compatibility
2021-01-22 17:47:03 +01:00
Brendan Dahl
2cba290361
Merge pull request #12836 from calixteman/update_buttons
JS -- update radio/checkbox values even if there are no actions
2021-01-21 14:00:26 -08:00
calixteman
1039698697
Add a parser to get font data from the default appearance (#12831)
* Add a parser to get font data from the default appearance
 - pdfium & poppler use a special parser too to get these info.

* Update src/core/default_appearance.js

Co-authored-by: Jonas Jenwald <jonas.jenwald@gmail.com>

Co-authored-by: Jonas Jenwald <jonas.jenwald@gmail.com>
2021-01-21 20:15:31 +01:00
Brendan Dahl
f45ba02fd3
Merge pull request #12850 from calixteman/missing_cstes
JS -- Add few missing constants in global scope
2021-01-20 11:33:02 -08:00
Calixte Denizet
0d1b19632d Enforce linewidth to 1px when at least one of scale factor is lower than 1 2021-01-15 13:18:24 +01:00
Jonas Jenwald
cf7eb87934 Remove a duplicated reference test (PR 12812 follow-up)
- Remove a *duplicated* reference test, see "issue12810", from the manifest.

 - Improve the spelling in a couple of comments in `src/core/canvas.js`, most notable of the word "parallelogram".

 - Update a comment, also in `src/core/canvas.js`, to actually agree with the value used to reduce confusion when reading the code.
2021-01-15 10:57:15 +01:00
Brendan Dahl
6619f1f3f2
Merge pull request #12812 from calixteman/too_thin
Enforce line width to be at least 1px after applied transform
2021-01-14 15:21:44 -08:00
Jonas Jenwald
2600e59acb Always re-measure non-embedded ArialNarrow fonts (bug 1671312, PR 12725 follow-up)
While PR 12725 fixed bug 1671312 as reported, i.e. the "In the upper right corner "Purposes' has bad kerning."-part, it however broke other parts of the text rendering.
Note in particular the tables, e.g. on page 2 and beyond, where the glyphs are now rendered too close together. The reason for this is that the fonts in question are non-embedded ArialNarrow, which we just replace with Helvetica which obviously is not narrow. Given that the font replacement isn't a perfect fit for non-embedded ArialNarrow, we still need to re-measure the glyph widths in this case.
2021-01-14 15:51:48 +01:00
Ross Johnson
6dae2677d5 [api-minor] Highlight search results correctly for normalized text (PR 9448)
This patch is a rebased *and* refactored version of PR 9448, such that it applies cleanly given that `PDFFindController` has changed since that PR was opened; obviously keeping the original author information intact.

This patch will thus ensure that e.g. fractions, and other things that we normalize before searching, will still be highlighted correctly in the textLayer.

Furthermore, this patch also adds basic unit-tests for this functionality.

*Note:* The `[api-minor]` tag is added, since third-party implementations of the `PDFFindController` must now always use the `pageMatchesLength` property to get accurate length information (see the `web/text_layer_builder.js` changes).

Co-authored-by: Ross Johnson <ross@mazira.com>
Co-authored-by: Jonas Jenwald <jonas.jenwald@gmail.com>
2021-01-12 18:08:08 +01:00
calixteman
1de1ae0be6
Merge pull request #12838 from calixteman/authors
[api-minor] Change the "dc:creator" Metadata field to an Array
2021-01-12 02:44:58 -08:00
Calixte Denizet
43d5512f5c [api-minor] Change the "dc:creator" Metadata field to an Array
- add scripting support for doc.info.authors
 - doc.info.metadata is the raw string with xml code
2021-01-11 21:34:07 +01:00
Calixte Denizet
8e6bec6e2e JS -- Add few missing constants in global scope
- these constants are available in pdfium implementation too
 - fix error code in aform.js
2021-01-11 17:19:28 +01:00
Calixte Denizet
b3dccd66ab Enforce line width to be at least 1px after applied transform
* add a comment to explain how minimal linewidth is computed.
 * when context.linewidth < 1 after transform, firefox and chrome
   don't render in the same way (issue #12810).
 * set lineWidth to 1 after transform and before stroking
   - aims fix issue #12295
   - a pixel can be transformed into a rectangle with both heights < 1.
     A single rescale leads to a rectangle with dim equals to 1 and
     the other to something greater than 1.
 * change the way to render rectangle with null dimensions:
   - right now we rely on the lineWidth set before "re" but
     it can be set after "re" and before "S" and in this case the rendering
     will be wrong.
   - render such rectangles as a single line.
2021-01-10 18:02:12 +01:00
Jonas Jenwald
246a6f9d13 Enable the Stylelint length-zero-no-unit rule
Note that these changes were done automatically, using `gulp lint --fix`.
With this rule, we'll thus enforce a *consistent* formatting of zero-lengths in our CSS files.

Please find additional details about the Stylelint rule at https://stylelint.io/user-guide/rules/length-zero-no-unit
2021-01-10 14:09:36 +01:00
Tim van der Meij
f85b8721d1
Merge pull request #12842 from Snuffleupagus/issue-12841
Improve handling of JPEG images without an EOI marker (issue 12841)
2021-01-10 13:21:28 +01:00
Tim van der Meij
699d65eb1c
Merge pull request #12840 from Snuffleupagus/sort-exports
Use ESLint to ensure that `export`s are sorted alphabetically
2021-01-10 13:17:28 +01:00
Jonas Jenwald
66b2c19368 Fix broken "issue12394" test-case
This test-case is currently broken, with the reference image being completely empty, since it uses the old "annotationStorage" format in the manifest.
2021-01-09 21:42:56 +01:00
Jonas Jenwald
81525fd446 Use ESLint to ensure that exports are sorted alphabetically
There's built-in ESLint rule, see `sort-imports`, to ensure that all `import`-statements are sorted alphabetically, since that often helps with readability.
Unfortunately there's no corresponding rule to sort `export`-statements alphabetically, however there's an ESLint plugin which does this; please see https://www.npmjs.com/package/eslint-plugin-sort-exports

The only downside here is that it's not automatically fixable, but the re-ordering is a one-time "cost" and the plugin will help maintain a *consistent* ordering of `export`-statements in the future.
*Note:* To reduce the possibility of introducing any errors here, the re-ordering was done by simply selecting the relevant lines and then using the built-in sort-functionality of my editor.
2021-01-09 20:37:51 +01:00
Jonas Jenwald
cd9422a075 Improve handling of JPEG images without an EOI marker (issue 12841)
Given that the PDF document in the issue contains the same very large JPEG image *three* times, this patch includes a test-case where only the first page has been extracted from it.
2021-01-09 20:19:39 +01:00
Tim van der Meij
c0a6d6cd21
Merge pull request #12394 from calixteman/appearance
In a text widget, Font resources can be in the appearance
2021-01-08 21:03:41 +01:00
Jonas Jenwald
941b65f683 Remove unncessary CanvasFactory/CMapReaderFactory/FileReaderFactory duplication in unit-tests
Given that the API will now, after PR 12039, automatically pick the correct factories to use depending on the environment (browser vs. Node.js), we can utilize that in the unit-tests as well. This way we don't have to manually repeat the same initialization code in *multiple* unit-tests.
*Note:* The *official* PDF.js API is defined in `src/pdf.js`, hence the new exports in `src/display/api.js` will not affect that.

Also, updates the unit-test `FileReaderFactory` helpers similarily.

*Drive-by change:* Fix the `CMapReaderFactory` usage in the annotation unit-tests, since the cache should only contain raw data and not a Promise. While this obviously works as-is, having unit-tests that "abuse" the intended data format can easily lead to unnecessary failures if changes are made to the relevant `src/core/` code.
2021-01-08 17:33:59 +01:00
Calixte Denizet
7172f0a928 JS -- update radio/checkbox values even if there are no actions 2021-01-08 16:43:16 +01:00
Calixte Denizet
83119b9000 In a text widget, Font resources can be in the appearance 2021-01-08 10:13:47 +01:00
Tim van der Meij
048081fb69
Merge pull request #12824 from Snuffleupagus/preEvaluateFont-errors
Improve the handling of errors, in `PartialEvaluator.loadFont`, occuring in `PartialEvaluator.preEvaluateFont` (issue 12823)
2021-01-07 23:15:41 +01:00
Tim van der Meij
5bde4b71f8
Merge pull request #12292 from calixteman/encoding
Fix encoding issues when printing/saving a form with non-ascii characters
2021-01-07 22:56:42 +01:00
Jonas Jenwald
78c32c2697 Improve the handling of errors, in PartialEvaluator.loadFont, occuring in PartialEvaluator.preEvaluateFont (issue 12823)
Currently any errors thrown in `preEvaluateFont`, which is a *synchronous* method, will not be handled at all in the `loadFont` method and we were thus failing to return an `ErrorFont`-instance as intended here.

Also, add an *explicit* check in `PartialEvaluator.preEvaluateFont` to ensure that Type0-fonts always have a *valid* dictionary.
2021-01-07 11:38:38 +01:00
Calixte Denizet
6523f8880b JS -- Plug PageOpen and PageClose actions 2021-01-06 13:31:15 +01:00
Calixte Denizet
56424967f2 Fix encoding issues when printing/saving a form with non-ascii characters 2021-01-05 17:23:18 +01:00
Calixte Denizet
a3c2d651ef Disable a test using "pending" function 2021-01-03 19:06:27 +01:00
Tim van der Meij
ca18af6af3
Merge pull request #12774 from calixteman/doc_action_test
JS -- Add tests for print/save actions
2021-01-03 18:46:37 +01:00
Jonas Jenwald
739d7c6d77 Support the once option, when registering EventBus listeners
This follows the same principle as the `once` option that exists in the native `addEventListener` method, and will thus automatically remove an `EventBus` listener when it's invoked; see https://developer.mozilla.org/en-US/docs/Web/API/EventTarget/addEventListener#Parameters

Finally, this patch also tweaks some the existing `EventBus`-code to use modern features such as optional chaining and logical assignment operators.
2020-12-29 16:49:13 +01:00
Tim van der Meij
50303fc8f4
Merge pull request #12766 from Snuffleupagus/issue-11004
Ignore, rather than throwing on, unsupported Coding style default (COD) options in JPEG 2000 images (issue 11004)
2020-12-28 20:26:10 +01:00
Calixte Denizet
ffd4bc790c JS -- Add tests for print/save actions
* change PDFDocument::hasJSActions to return true when there are JS actions in catalog.
2020-12-24 18:51:00 +01:00
Brendan Dahl
d060cd2a61
Merge pull request #12637 from calixteman/buttons
JS -- Add support for buttons
2020-12-22 22:18:37 -08:00
Calixte Denizet
9dc331ec62 Remove timeout in annotation integration test 2020-12-22 16:50:28 +01:00
Calixte Denizet
7c3facb174 JS -- Add support for buttons
* radio buttons
 * checkboxes
2020-12-22 16:41:51 +01:00
Jonas Jenwald
cffb7af3b0 Ignore, rather than throwing on, unsupported Coding style default (COD) options in JPEG 2000 images (issue 11004)
Similar to other markers that we currently skip, by ignoring unsupported Coding style default (COD) options we'll at least render *something* here (although some JPEG 2000 images may look slightly wrong).
Note that if the unsupported COD options lead to additional errors, during parsing, we'll still abort parsing of the JPEG 2000 image.
2020-12-21 20:35:52 +01:00
Brendan Dahl
3ea1c43b15
Merge pull request #12751 from calixteman/da_not_a_string
Add a default DA for textfield to avoid issues when printing or saving
2020-12-21 09:44:08 -08:00
Calixte Denizet
a7c682c600 Add a default DA for textfield to avoid issues when printing or saving
* it aims to fix issue #12750
2020-12-19 23:38:45 +01:00
Jonas Jenwald
6f40f4e7c2 Remove the arbitrary timeout in the "must check that first text field has focus" integration-test (PR 12702 follow-up)
It seems that the timeout is way too short in practice, since this new integration-test failed *intermittently* already in PR 12702 (which is where the test was added).

The ideal solution here would be to simply await an event, dispatched by the viewer, however that unfortunately doesn't appear to be supported by Puppeteer.
Instead, the solution implemented here is to add a new method in `PDFViewerApplication` which Puppeteer can query to check if the scripting/sandbox has been fully initialized.
2020-12-19 09:32:58 +01:00
calixteman
e6e2809825
Merge pull request #12702 from calixteman/doc_actions
JS - Collect and execute actions at doc level
2020-12-18 21:33:32 +01:00
Tim van der Meij
b7fc916f48
Merge pull request #12753 from Snuffleupagus/issue-12752
Ignore, rather than throwing on, Coding style component (COC) markers in JPEG 2000 images (issue 12752)
2020-12-18 21:14:06 +01:00
Calixte Denizet
1e2173f038 JS - Collect and execute actions at doc and pages level
* the goal is to execute actions like Open or OpenAction
 * can be tested with issue6106.pdf (auto-print)
 * once #12701 is merged, we can add page actions
2020-12-18 20:03:59 +01:00
Jonas Jenwald
48a76aea2b Ignore, rather than throwing on, Coding style component (COC) markers in JPEG 2000 images (issue 12752)
Similar to other markers that we currently skip, by ignoring the Coding style component (COC) marker we'll at least prevent outright errors (although some JPEG 2000 images may look slightly wrong).
2020-12-18 18:18:32 +01:00
Calixte Denizet
167ff1a7fc JS -- Actions must be evaluated in global scope
* All the public properties of doc are injected into globalThis, in order to make them available through `this`
 * Put event in the global scope too.
2020-12-17 22:01:45 +01:00
Calixte Denizet
8bff4f1ea9 In order to simplify m-c code, move some in pdf.js
* move set/clear|Timeout/Interval and crackURL code in pdf.js
 * remove the "backdoor" in the proxy (used to dispatch event) and so return the dispatch function in the initializer
 * remove listeners if an error occured during sandbox initialization
 * add support for alert and prompt in the sandbox
 * add a function to eval in the global scope
2020-12-17 15:03:26 +01:00
Calixte Denizet
03814bd6a2 Don't use 'in' operator to check if key is in a Map 2020-12-16 16:00:12 +01:00
Calixte Denizet
6502ae889d JS -- Send events to the sandbox from annotation layer 2020-12-15 16:28:47 +01:00
Calixte Denizet
c6c9cc96bf Follow-up of #12707: Add an integration test for checkboxes as radio buttons
* Integration tests: Add a function to load a pdf and wait for a selected element
 * Integration tests: Add a function to close all the open pages
2020-12-15 00:00:04 +01:00
Tim van der Meij
959dc379ee
Merge pull request #12733 from Snuffleupagus/bug-1292316-test
Add a test-case for bug 1292316
2020-12-13 13:38:40 +01:00
Jonas Jenwald
7f89be5dbf Add a test-case for bug 1292316
It appears that the PDF document in [bug 1292316](https://bugzilla.mozilla.org/show_bug.cgi?id=1292316) now renders "correctly"[1] when compared to e.g. Adobe Reader and PDFium. Most likely this bug was fixed by a *somewhat* recent patch, or patches, to the `XRef.indexObjects` method.

Before just closing [bug 1292316](https://bugzilla.mozilla.org/show_bug.cgi?id=1292316) as WFM, I figured that it probably can't hurt to add it as a new test-case to avoid accidentally regressing this document in the future.

---
[1] Given that the XRef table is corrupt, and that we're forced to recover, there's generally speaking probably some question as to what actually constitutes "correct" in this case.
2020-12-12 13:24:31 +01:00
Jonas Jenwald
9adb225a7d Call done.fail correctly in the scripting_spec.js unit-tests
The `done.fail` method should *always* be called with a reason, to ensure that any errors are propagated as intended to the test results.
2020-12-12 12:41:47 +01:00
Tim van der Meij
d1848f5022
Merge pull request #12725 from brendandahl/remeasure-std
Use widths defined by font for standard fonts.
2020-12-11 20:36:19 +01:00
Jonas Jenwald
67e5db75d8 Ignore color-operators in Type3 glyphs beginning with a d1 operator (issue 12705)
Please refer to the PDF specification at https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G8.1977497 and https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G7.3998470

This patch removes the color-operators in the evaluator, since that should be more efficient than doing it repeatedly in the main-thread when rendering the Type3 glyphs.
2020-12-11 15:49:13 +01:00
Brendan Dahl
45d9ab6e45 Use widths defined by font for standard fonts.
There doesn't seem to be anything definitive about this in
the spec, but from experimenting, it seems acrobat lets
PDFs override the widths of the standard fonts.
2020-12-10 15:30:39 -08:00
Tim van der Meij
00b4f86db3
Merge pull request #12717 from Snuffleupagus/issue-12714
Ensure that the /Annots-entry, on /Page-instances, is actually an Array (issue 12714)
2020-12-10 23:06:59 +01:00
Tim van der Meij
954ac3d944
Merge pull request #12719 from calixteman/emailvalidate
JS -- add function eMailValidate used to validate an email address
2020-12-10 22:19:37 +01:00
Brendan Dahl
31ea30ab25
Merge pull request #12668 from calixteman/interaction
Add some integration tests using puppeteer
2020-12-10 12:52:03 -08:00
Calixte Denizet
f94269c0d1 JS -- add function eMailValidate used to validate an email address 2020-12-10 21:51:37 +01:00
Tim van der Meij
7097114e0c
Merge pull request #12720 from calixteman/fix_co
Be sure that CalculationOrder is either null or a non-empty array
2020-12-10 21:43:35 +01:00
Calixte Denizet
5b42ac364a Add some integration tests using puppeteer and Jasmine
* run with `gulp integrationtest`
2020-12-10 20:55:15 +01:00
Calixte Denizet
c7b09b8efc JS -- fix printd issue with negative number 2020-12-10 18:43:04 +01:00
Calixte Denizet
25bf504ff5 Be sure that CalculationOrder is either null or a non-empty array 2020-12-10 16:02:11 +01:00
Jonas Jenwald
796a0d3155 Ensure that the /Annots-entry, on /Page-instances, is actually an Array (issue 12714)
In the referenced PDF document, the second and third page has *corrupt* /Annots-entries which contain /Dict-data rather than the intended Arrays.
2020-12-10 11:42:00 +01:00
Calixte Denizet
0f899edfc8 JS -- Add aform functions
* These functions aren't in the PDF specs but seems to be widely used
 * So the specs for these functions are:
   * http://www.sfu.ca/~wcs/ForGraham/Aladdin%20stuff/Acrobat%20Reader%205.0/Contents/MacOS/JavaScripts/AForm.js
   * pdfium source code
2020-12-07 19:37:34 +01:00
Tim van der Meij
d784af3f38
Merge pull request #12696 from timvandermeij/annotation-quadpoints
Fix non-standard quadpoints orders for annotations
2020-12-06 16:52:33 +01:00
Tim van der Meij
012e15f7a3
Fix non-standard quadpoints orders for annotations
This change requires us to use valid quadpoints arrays in the existing
unit tests too due to the normalization.
2020-12-06 16:02:41 +01:00
Jonas Jenwald
c549069ebd Replace the testMode parameter in src/pdf.sandbox.js with a constant, set using the pre-processor
This simplifies not just this code, but the unit-tests as well, and should be sufficient as far as I can tell.
Note also that currently, in the *built* `pdf.sandbox.js` file, there's even a line reading `testMode = testMode && false;` because of an accidentally flipped pre-processor statement.

Finally, in the `scripting_spec.js` unit-test, defines `sandboxBundleSrc` at the top of the file to make it easier to find and/or change it when necessary.
2020-12-05 23:04:34 +01:00
Jonas Jenwald
dc84c8a02a Update the link for the "pr8808" test-case (issue 12680)
This seems like a very minor issue, since in general we can't really help if domains are blocked from certain networks, however in this particular case I suppose that using the Internet Archive should work.
2020-12-02 15:06:09 +01:00
Brendan Dahl
956fcab967
Merge pull request #12631 from calixteman/app
JS -- Implement app object
2020-12-01 16:50:16 -08:00
Jonas Jenwald
c42029489e Run gulp lint --fix, to account for changes in Prettier version 2.2.1
Please refer to https://github.com/prettier/prettier/blob/master/CHANGELOG.md#221 for additional details.
2020-11-29 10:01:46 +01:00
Tim van der Meij
256068556d
Merge pull request #12662 from Snuffleupagus/issue-12402
Check the top-level /Pages dictionary when finding the trailer in `XRef.indexObjects` (issue 12402)
2020-11-25 21:54:41 +01:00
Jonas Jenwald
8a132f584d Check the top-level /Pages dictionary when finding the trailer in XRef.indexObjects (issue 12402)
In addition to the existing /Root and /Pages validation, also check that the /Pages-entry actually is a dictionary and that it has a valid /Count-entry.
This way we can avoid picking a trailer candidate which e.g. the `Catalog.numPages` getter will just end up rejecting, thus breaking PDF document loading completely.
2020-11-25 15:14:53 +01:00
Calixte Denizet
18b525de2e Parenthesis in names are not escaped when saving 2020-11-25 12:28:12 +01:00
Calixte Denizet
283aac4c53 JS -- Implement app object
* https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/AcrobatDC_js_api_reference.pdf
 * Add color, fullscreen objects + few constants.
2020-11-20 15:46:52 +01:00
Jonas Jenwald
01d12b465c [api-minor] Add "contentLength" to the information returned by the getMetadata method
Given that we already include the "Content-Disposition"-header filename, when it exists, it shouldn't hurt to also include the information from the "Content-Length"-header.
For PDF documents opened via a URL, which should be a very common way for the PDF.js library to be used, this will[1] thus provide a way of getting the PDF filesize without having to wait for the `getDownloadInfo`-promise to resolve[2].

With these API improvements, we can also simplify the filesize handling in the `PDFDocumentProperties` class.

---
[1] Assuming that the server is correctly configured, of course.

[2] Since that's not *guaranteed* to happen in general, with e.g. `disableAutoFetch = true` set.
2020-11-20 15:30:36 +01:00
Brendan Dahl
c88e805870
Merge pull request #12604 from calixteman/quickjs
JS -- Add a sandbox based on quickjs
2020-11-19 08:40:21 -08:00
Brendan Dahl
4ba28de260
Merge pull request #12567 from calixteman/hidden
[api-minor] JS -- hidden annotations must be built in case a script show them
2020-11-19 08:35:47 -08:00
Calixte Denizet
c7974e9996 JS -- Add a sandbox based on quickjs
* quickjs-eval.js has been generated using https://github.com/mozilla/pdf.js.quickjs/
 * lazy load of sandbox code
 * Rewrite tests to use the sandbox
 * Add a task `watch-sandbox` which update bundle pdf.sandbox.js on change in the sandbox code
2020-11-19 13:40:46 +01:00
Brendan Dahl
f39d87bff1
Merge pull request #12569 from calixteman/events
JS -- Fix events dispatchment and add tests
2020-11-16 10:26:29 -08:00
Calixte Denizet
611207d2c9 Fix popup for highlights without popup (follow-up of #12505)
* remove 1st param of _createPopup (almost useless for a method)
 * prepend popup div to avoid to have them on top of some highlights (and so "disable" partially mouse events)
 * add a ref test for issue #12504
2020-11-10 17:33:54 +01:00
Calixte Denizet
2dfac4cb41 JS -- Fix events dispatchment and add tests
* dispatch event to take into account calculation order
 * use a map for actions in Field
2020-11-10 17:26:29 +01:00
Calixte Denizet
b11592a756 JS -- hidden annotations must be built in case a script show them
* in some pdf, there are actions with "event.source.hidden = ..."
 * in order to handle visibility when printing, annotationStorage is extended to store multiple properties (value, hidden, editable, ...)
2020-11-10 12:48:34 +01:00
Calixte Denizet
a5279897a7 JS -- Add listener for sandbox events only if there are some actions
* When no actions then set it to null instead of empty object
* Even if a field has no actions, it needs to listen to events from the sandbox in order to be updated if an action changes something in it.
2020-11-09 18:37:59 +01:00
Jonas Jenwald
9602844368 Enable the ESLint no-useless-escape rule (PR 12551 follow-up)
Note that a number of these cases are covered by existing unit-tests, and a few others only matter for the development/build scripts.
Furthermore, I've also tried to the best of my ability to test each case *manually* to hopefully further reduce the likelihood of this patch introducing any bugs.

Please find additional details about the ESLint rule at https://eslint.org/docs/rules/no-useless-escape
2020-11-07 13:06:24 +01:00
Brendan Dahl
018fd43096
Merge pull request #12530 from calixteman/js_utils
JS -- Add 'util' object
2020-11-06 09:59:50 -08:00
Calixte Denizet
f69e848b1c JS -- Add 'util' object
This patch provides an implementation of the util object as described:
 * https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/js_api_reference.pdf#page=716
2020-11-06 18:12:29 +01:00
Tim van der Meij
be0794cb08
Merge pull request #12564 from hakubo/patch-1
Make sure that Popup is rendered next to trigger for textAnnotation
2020-11-06 00:02:11 +01:00
Tim van der Meij
646f895d35
Merge pull request #12568 from calixteman/defaultvalue
[api-minor] JS -- Add default value in annotation data
2020-11-05 22:53:21 +01:00
Calixte Denizet
39f5954729 JS -- Add default value in annotation data
* these values are used when a form is resetted
2020-11-05 13:44:23 +01:00
Jakub Olek
b642d49108 Make sure that Popup is rendered next to trigger for textAnnotation 2020-11-05 06:45:17 +01:00
Jonas Jenwald
ba761e42f0 Change the getVisibleElements helper function to take a parameter object
Given the number of parameters, and the fact that many of them are booleans, the call-sites are no longer particularly easy to read and understand. Furthermore, this slightly improves the formatting of the JSDoc-comment, since it needed updating as part of these changes anyway.

Finally, this removes an unnecessary `numViews === 0` check from `getVisibleElements`, since that should be *very* rare and more importantly that the `binarySearchFirstItem` function already has a fast-path for that particular case.
2020-11-04 12:15:04 +01:00
Tim van der Meij
e341e6e542
Merge pull request #12525 from brendandahl/mark-info
[api-minor] Implement API to get MarkInfo from the catalog.
2020-10-31 00:05:19 +01:00
Brendan Dahl
f5c821e9c3 [api-minor] Implement API to get MarkInfo from the catalog. 2020-10-30 10:59:45 -07:00
Jonas Jenwald
852c61ef57 Add a MurmurHash3_64.update unit-test for TypedArrays which share the same underlying ArrayBuffer (PR 12534 follow-up)
This probably ought to have been included in PR 12534, but better late than never I suppose, since it helps to more clearly demonstrate the bug in a way that a reference-test alone just cannot do.

When writing this unit-test I also noticed that it required a certain amount of "luck" to actually trigger the bug, prior to the patch, since it seems that the bug only reproduced for certain *unfortunate* sequences of TypedArray data. (The added unit-test contains one such, purposely simple, example.)
2020-10-28 12:42:04 +01:00
Tim van der Meij
ea4d88a330
Merge pull request #12395 from calixteman/checks
Render not displayed annotations in using normal appearance when printing
2020-10-28 00:11:10 +01:00
Calixte Denizet
6be2f84b4e Render not displayed annotations in using normal appearance when printing 2020-10-27 19:00:31 +01:00
Jonas Jenwald
d8da6afa4c Update the description of the test-case used in the escapeString unit-test
The description *itself* didn't escape the control characters correctly, leading to line-breaks being inserted in the test logs.
2020-10-27 11:47:40 +01:00
Jonas Jenwald
92477333f6 Load the non-test files with standard import statements when running the unit-tests
The unit-test files themselves shouldn't be loaded until Jasmine has been setup/configured, however that doesn't matter for the "normal" PDF.js library files. Hence we can simply `import` them in the standard way.
2020-10-27 11:47:35 +01:00
Jonas Jenwald
8eeb0bcbe4 Import the TestReporter, in the unit and font tests
This way it's no longer necessary to load it as a script in the html-files.
2020-10-27 11:30:15 +01:00
Jonas Jenwald
939af08ee1 Remove SystemJS usage from the font-tests
With these changes, SystemJS is now *only* used to load the worker-file in development mode (pending removal once https://bugzilla.mozilla.org/show_bug.cgi?id=1247687 is fixed).
2020-10-26 23:42:44 +01:00
Jonas Jenwald
15a5f66973 Enable the ESLint no-var rule in the test/font/ folder
This was done automatically, using the `gulp lint --fix` command.
2020-10-26 23:42:44 +01:00
Jonas Jenwald
6967b9dd96 Modernize the font-tests
This patch first of all enables linting of the files in the `test/font/` folder, and secondly it also re-factors all test files to use native `import`/`export` statements. Finally, all tests are now loaded correctly, rather than being included as scripts through the `font_test.html` file.
2020-10-26 23:42:44 +01:00
Tim van der Meij
71a14be8e7
Merge pull request #12534 from Snuffleupagus/murmurhash-slice
Ensure that `MurmurHash3_64.update` handles `ArrayBuffer` input correctly, to avoid hash-collisions (issue 12533)
2020-10-26 23:34:03 +01:00
Jonas Jenwald
f2fa053c51 Ensure that MurmurHash3_64.update handles ArrayBuffer input correctly, to avoid hash-collisions (issue 12533)
Different fonts incorrectly end up with *identical* hashes, despite having different /ToUnicode data.
The issue, and it's very interesting that we've apparently not seen it before, appears to be caused by the fact that different /ToUnicode entries share the *same* underlying `ArrayBuffer`, which thus becomes problematic at the `const dataUint32 = new Uint32Array(data.buffer, 0, blockCounts);` line. The simplest solution thus seem to be to just *copy* the input, when it's an `ArrayBuffer`, rather than using it as-is. (Note that if we'd stringified the input, when calling `MurmurHash3_64.update`, the issue would also have been fixed. In this case, we're already creating an unique TypedArray.)
2020-10-26 16:27:33 +01:00
Jonas Jenwald
1c4495843c Load all unit-tests with native import, rather than SystemJS 2020-10-26 11:11:48 +01:00
Tim van der Meij
fe08ef4e39
Fix var conversions that ESLint could not do automatically
This mainly involves the `crypto_spec.js` file which declared most
variables before their usage, which is not really consistent with the
rest of the codebase. This also required reformatting some long arrays
in that file because otherwise we would exceed the 80 character line
limit. Overall, this makes the code more readable.
2020-10-25 16:17:12 +01:00
Tim van der Meij
3e2bfb5819
Convert var to const/let in the test/unit folder
This has been done automatically using ESLint's `--fix` argument.
2020-10-25 15:40:51 +01:00
Tim van der Meij
314ac21842
Disable var usage for the test/unit folder
This allows us to enforce that `var` is not used anymore in the unit
tests to modernize the code and prevent subtle bugs.
2020-10-25 15:38:52 +01:00
Tim van der Meij
b4ca3d55b8
Merge pull request #12508 from calixteman/button_fallback_font
Fallback font for buttons must be ZapfDingbats.
2020-10-24 18:56:12 +02:00
Tim van der Meij
0d1a874358
Merge pull request #12464 from baloone/Fix_getVisibleElements_in_rtl_direction
Fix getVisibleElements helper in RTL-locales
2020-10-24 17:03:57 +02:00
Calixte Denizet
37c86b2daa Fallback font for buttons must be ZapfDingbats.
Fix bug https://bugzilla.mozilla.org/show_bug.cgi?id=1669099.
2020-10-24 12:00:03 +02:00
Calixte Denizet
d2ef878702 Invalidate an annotation with no quadPoints (when it's required)
Some pdf softwares don't remove highlight annotations but make the QuadPoints array empty.
And the Rect for the annotation can be [-32768, -32768, 32768, 32768] so it leads to have a giant div which catches all the mouse events and make the pdf unusable when there are some forms elements.
2020-10-21 13:53:19 +02:00
Mohamed
b7b048e36c Fix getVisibleElements helper in RTL-locales 2020-10-20 23:34:09 +02:00
Calixte Denizet
e46e314867 Add a test for pdfDocument::fieldObjects 2020-10-17 19:48:40 +02:00
Calixte Denizet
c30a3a94f0 JS - Add a function in api to get the fields ids in AcroForm::CO 2020-10-17 12:56:40 +02:00
Tim van der Meij
ff2631493e
Merge pull request #12481 from calixteman/issue_12475
Get urls if any in AA::D dictionary for pushbuttons
2020-10-16 22:55:43 +02:00
Jonas Jenwald
3351d3476d Don't store complex data in PDFDocument.formInfo, and replace the fields object with a hasFields boolean instead
*This patch is based on a couple of smaller things that I noticed when working on PR 12479.*

 - Don't store the /Fields on the `formInfo` getter, since that feels like overloading it with unintended (and too complex) data, and utilize a `hasFields` boolean instead.
   This functionality was originally added in PR 12271, to help determine what kind of form data a PDF document contains, and I think that we should ensure that the return value of `formInfo` only consists of "simple" data.
   With these changes the `fieldObjects` getter instead has to look-up the /Fields manually, however that shouldn't be a problem since the access is guarded by a `formInfo.hasFields` check which ensures that the data both exists and is valid. Furthermore, most documents doesn't even have any /AcroForm data anyway.

 - Determine the `hasFields` property *first*, to ensure that it's always correct even if there's errors when checking e.g. the /XFA or /SigFlags entires, since the `fieldObjects` getter depends on it.

 - Simplify a loop in `fieldObjects`, since the object being accessed is a `Map` and those have built-in iteration support.

 - Use a higher logging level for errors in the `formInfo` getter, and include the actual error message, since that'd have helped with fixing PR 12479 a lot quicker.

 - Update the JSDoc comment in `src/display/api.js` to list the return values correctly, and also slightly extend/improve the description.
2020-10-16 12:47:27 +02:00
Calixte Denizet
ce3d3a6ff8 Get urls if any in AA::D dictionary for pushbuttons 2020-10-15 19:42:36 +02:00
Jonas Jenwald
5f8957e8df Fix the "should get form info when AcroForm is present" unit-test
The last unit-test didn't work correctly, since an error was thrown in `PDFDocument._hasOnlyDocumentSignatures` because the mocked `XRef`-instance wasn't actually being set correctly.

Also, updates the `XRefMock` to use `async` methods where appropriate.
2020-10-15 13:26:32 +02:00
Calixte Denizet
71ecc3129b Add the possibility to collect Javascript actions 2020-10-14 10:44:16 +02:00
Jani Pehkonen
935568c2f1 Fix invalid XUID entries in CFF fonts
In CFF fonts, entry `XUID` should be an array that has no more than
16 elements. In the issue, the length is 20, which causes the fonts to fail.
See Appendix B, "Implementation Limits" in PostScript Language Reference Manual
https://web.archive.org/web/20170218093716/https://www.adobe.com/products/postscript/pdfs/PLRM.pdf
Actually entries `XUID` and `UniqueID` are obsolete altogether.
https://blogs.adobe.com/CCJKType/2016/06/no-more-xuid-arrays.html
2020-10-05 17:38:01 +03:00
Jonas Jenwald
c5a1a6fdd5 Remove now unnecessary no-unsanitized/method disabling in test/unit/jasmine-boot.js
With the latest release of the `eslint-plugin-no-unsanitized` package, we no longer need to disable this rule; see https://github.com/mozilla/eslint-plugin-no-unsanitized/pull/150
2020-10-04 15:30:24 +02:00
Tim van der Meij
c10aac9a1d
Upgrade Puppeteer to version 5.3.1 2020-10-03 23:06:31 +02:00
Tim van der Meij
6ff1fe4ea9
Merge pull request #12333 from calixteman/tooltip
Add tooltip if any in annotations layer
2020-10-03 19:50:39 +02:00
calixteman
20b12d2bda Add tooltip if any in annotations layer 2020-10-02 10:11:18 +02:00
Jonas Jenwald
bd3b15b897 Use the cidToGidMap, if it exists, when building the glyph mapping for non-embedded composite fonts (issue 12418) 2020-09-28 14:40:43 +02:00
Tim van der Meij
120c5c2261
Merge pull request #12409 from Snuffleupagus/bug-1627030
Compute the `transformOrigin` correctly, for negative values, when rendering `AnnotationElement`s (bug 1627030)
2020-09-24 23:48:21 +02:00
Calixte Denizet
5af352e65a Need to reset the streams when printing 2020-09-24 19:13:09 +02:00
Jonas Jenwald
fca53a8eb0 Compute the transformOrigin correctly, for negative values, when rendering AnnotationElements (bug 1627030)
This changes the `transformOrigin` calculations in `AnnotationElement._createContainer` and `PopupAnnotationElement.render`, to ensure that e.g. the clickable area of annotations and/or popups are both positioned correctly.

The problem occurs for *negative* values, since they're not negated correctly because of how the `transformOrigin` strings were build; see issue 12406 for a more in-depth explanation. Previously, for negative values, the `transformOrigin` strings would thus be ignored since they're not valid.
2020-09-24 10:28:29 +02:00
Jonas Jenwald
2497e8eab9 Prevent errors if the InkList property, in InkAnnotations, is missing and/or not an Array (issue 12392)
To prevent a future bug, the `Vertices` property in PolylineAnnotations are handled the same way.
2020-09-19 15:34:32 +02:00
Calixte Denizet
d51e7e86ff Use the same kind of strings for radio values 2020-09-16 18:47:25 +02:00
Calixte Denizet
16dd5403c7 Set parent of radio annotation even if there is no 'V' field 2020-09-15 14:41:57 +02:00
Tim van der Meij
b0c7a74a0c
Merge pull request #12361 from Snuffleupagus/_getSaveFieldResources
Ensure that all necessary /Font resources are included when saving a `WidgetAnnotation`-instance (issue 12294)
2020-09-15 00:09:31 +02:00
Tim van der Meij
9d7b1d89ca
Merge pull request #12370 from timvandermeij/annotation-reset
Implement resetting of created streams for annotations
2020-09-14 23:16:17 +02:00
Tim van der Meij
3ecd984758
Implement resetting of created streams for annotations 2020-09-14 23:08:50 +02:00
Calixte Denizet
0c8de5aaf9 Replace \n and \r by \n and \r when saving a string 2020-09-14 17:34:39 +02:00
Jonas Jenwald
c992b8e460 Ensure that all necessary /Font resources are included when saving a WidgetAnnotation-instance (issue 12294)
This patch contains a possible approach for fixing issue 12294, which compared to other PRs is purposely limited to the affected `WidgetAnnotation` code.

As mentioned elsewhere, considering that we're (at least for now) trying to fix *one specific* case, I think that we should avoid modifying the `Dict` primitive[1] and/or avoid a solution that (indirectly) modifies an existing `Dict`-instance[2].
This patch simply fixes the issue at hand, since that seems easiest for now, and I'd suggest that we worry about a more general approach if/when that actually becomes necessary.

Hence the solution implemented here, for `WidgetAnnotation`, is to simply use a combination of the local *and* AcroForm /DR resources during OperatorList-parsing to ensure that things work correctly regardless of where a particular /Font resource is found.
For saving of form-data, on the other hand, we want to avoid increasing the file-size unnecessarily and need to be smarter than just merging all of the available resources. To achive this, a new `WidgetAnnotation._getSaveFieldResources` method will when necessary produce a combined resources `Dict` with only the minimum amount of data from the AcroForm /DR resources included.

---
[1] You want to avoid anything that could cause the general `Dict` implementation to become slower, or more complex, just for handling an edge-case in my opinion.

[2] If an existing `Dict`-instance is modified unexpectedly, that could very easily lead to problems elsewhere since e.g. `Dict`-instances created during parsing are not expected to be changed.
2020-09-14 15:22:40 +02:00
Jonas Jenwald
18767445a4 Skip failing FBF tests, when running makeref, in Firefox as well
This will allow `makeref` to run "successfully" on the bots, since in the current state testing/makeref is just overall broken.
Obviously we still need to figure what's causing the intermittent failures, and fix them, but let's at least unblock things for now; see issue 12371.
2020-09-13 12:24:04 +02:00
Calixte Denizet
fc154590e8 Dict keys need to be escaped too when saving 2020-09-11 12:25:05 +02:00
Calixte Denizet
dc4eb71ff1 PDF names need to be escaped when saving 2020-09-10 16:08:13 +02:00
Tim van der Meij
f9d56320f5
Merge pull request #12349 from calixteman/followup_12344
Follow-up of pr #12344
2020-09-09 23:40:53 +02:00
Calixte Denizet
908e7ae5e4 Set the modification date to the current day when saving 2020-09-09 19:06:39 +02:00
Calixte Denizet
64a6efd95e Follow-up of pr #12344 2020-09-09 11:46:02 +02:00
Brendan Dahl
e51e9d1f33
Merge pull request #12345 from calixteman/save_btn
Don't try to save something for a button which is neither a checkbox nor a radio
2020-09-08 15:44:04 -07:00
calixteman
68b99c59ee
Save form data in XFA datasets when pdf is a mix of acroforms and xfa (#12344)
* Move display/xml_parser.js in shared to use it in worker

* Save form data in XFA datasets when pdf is a mix of acroforms and xfa

Co-authored-by: Brendan Dahl <brendan.dahl@gmail.com>
2020-09-08 15:13:52 -07:00
Calixte Denizet
7e5026dfc5 Don't try to save something for a button which is neither a checkbox nor a radio 2020-09-08 20:47:46 +02:00
Tim van der Meij
20c891542b
Merge pull request #12269 from calixteman/highlight
Add support for missing appearances for hightlights, strikeout, squiggly and underline annotations.
2020-09-06 22:25:36 +02:00
Calixte Denizet
65ecd981fe Add support for missing appearances for hightlights, strikeout, squiggly and underline annotations. 2020-09-06 15:40:15 +02:00
Jonas Jenwald
6c8f1f7d6f Run gulp lint --fix, to account for changes in Prettier version 2.1.x 2020-09-06 12:23:59 +02:00
Jonas Jenwald
60c9556b66 Restore the "gets expected character types" test, in test/unit/pdf_find_utils_spec.js, to its intended formatting
This was "broken" by the introduction of Prettier, however a recent update (version `2.1.x`) will now correctly ignore these escape sequences.
2020-09-06 12:23:09 +02:00
Jonas Jenwald
784a420027 Add support, in Dict.merge, for merging of "sub"-dictionaries
This allows for merging of dictionaries one level deeper than previously. This could be useful e.g. for /Resources dictionaries, where you want to e.g. merge their respective /Font dictionaries (and other) together rather than picking just the first one.
2020-08-30 23:18:32 +02:00
Jonas Jenwald
9ba5f9fa34 Create an OptionalContentConfig-instance once for each task, when running the reference test-suite
This avoids the need to make a round-trip to the worker-thread for *every* single page that's being tested, which should thus be more efficient.
2020-08-30 16:28:40 +02:00
Tim van der Meij
06b53d770a
Merge pull request #12259 from brendandahl/cmap-fix
Fix handling of symbolic fonts and unicode cmaps.
2020-08-30 16:01:24 +02:00
Brendan Dahl
45e8a31cc0 Fix handling of symbolic fonts and unicode cmaps.
In issue 12120, the font has a 1,0 cmap and is marked symbolic which
according to the spec means we should directly use the cmap instead of
the extra steps that are defined in 9.6.6.4.

However, just fixing that caused bug 1057544 to break. The font in bug
1057544 has a 0,1 cmap (Unicode 1.1) which we were not using, but is
easy to support. We're also easily able to support some of the other
unicode cmaps, so I added those as well.

There was also a second issue with bug 1057544, the cmap doesn't have
a mapping for the "quoteright" glyph, but it is defined in the post
table. To handle this, I've moved post table as a  fallback for any
font that has an encoding.
2020-08-27 14:33:11 -07:00
Calixte Denizet
ba94f04ba3 Bug 1661226 - Push button are not rendered with renderInteractiveForms enabled 2020-08-27 10:45:14 +02:00
Tim van der Meij
280207c740
Redo the form type detection logic and include unit tests
Good form type detection is important to get reliable telemetry and to
only show the fallback bar if a form cannot be filled out by the user.

PDF.js only supports AcroForm data, so XFA data is explicitly unsupported
(tracked in issue #2373). However, the previous form type detection
couldn't separate AcroForm and XFA well enough, causing form type
telemetry to be incorrect sometimes and the fallback bar to be shown for
forms that could in fact be filled out by the user.

The solution in this commit is found by studying the specification and
the form documents that are available to us. In a nutshell the rules are:

- There is XFA data if the `XFA` entry is a non-empty array or stream.
- There is AcroForm data if the `Fields` entry is a non-empty array and
  it doesn't consist of only document signatures.

The document signatures part was not handled in the old code, causing a
document with only XFA data to also be marked as having AcroForm data.
Moreover, the old code didn't check all the data types.

Now that AcroForm and XFA can be distinguished, the viewer is configured
to only show the fallback bar for documents that only have XFA data. If
a document also has AcroForm data, the viewer can use that to render the
form. We have not found documents where the XFA data was necessary in
that case.

Finally, we include unit tests to ensure that all cases are covered and
move the form type detection out of the `parse` function so that it's
only executed if the document information is actually requested
(potentially making initial parsing a tiny bit faster).
2020-08-25 23:28:55 +02:00
Tim van der Meij
f20f0bcc78
Move the AcroForm logic from the document to the catalog
The `AcroForm` entry is part of the catalog, not of the document, so its
logic should be placed there instead. The document should look in the
catalog to fetch it, and not have knowledge of `catDict`, which is a
member internal to the catalog.

Moreover, make the AcroForm member private on the document instance. It's
only used internally and was also never intended to be public. For users
it's exposed by the `getMetadata` API endpoint as `IsAcroFormPresent`.
Only a boolean is exposed, so we now also only store the boolean on the
document instance.

Finally, the annotation code needs access to the full AcroForm
dictionary, so it's updated to fetch the data from the catalog instead
of the document that now only holds the boolean.
2020-08-25 23:28:55 +02:00
Tim van der Meij
73bc8e1d9f
Include forms/print reference tests for the document from #12233
In addition to the unit tests these reference tests make sure that this
document, that triggered some edge cases in our code, can be rendered
and printed successfully now.
2020-08-23 13:00:02 +02:00
Tim van der Meij
a8efc0296b
Obtain the export values for choice widgets from the normal appearance
The down appearance (`D`) is optional and not available in the document
from #12233, so the checkboxes are never saved/printed as checked
because the checked appearance is based on the export value that is
missing because the `D` entry is not available.

Instead, we should use the normal appearance (`N`) since that one is
required and therefore always available.

Finally, the /Off appearance is optional according to section 12.7.4.2.3
of the specification, so that needs to be taken into account to match
the specification and to fix reference test failures for the
`annotation-button-widget-print` test. That is a file that doesn't
specify an /Off appearance in the normal appearance dictionary.
2020-08-23 13:00:02 +02:00
Tim van der Meij
1b82ad8fff
Decode widget form values consistently
The helper method `_decodeFormValue` is used to ensure that it happens
in one place. Note that form values are field values, display values
and export values.
2020-08-23 13:00:01 +02:00
Tim van der Meij
3c790936c1
Merge pull request #12247 from timvandermeij/acroform-choice-null
Improve the field value parsing for choice widgets to handle `null` values
2020-08-21 23:17:20 +02:00
Aki Sasaki
83365a3756 confirm if leaving a modified form without saving 2020-08-20 17:23:06 -07:00
Tim van der Meij
12c20772ac
Improve the field value parsing for choice widgets to handle null values
The specification states that the field value is `null` if no item is
selected and we didn't handle this case properly. Even though this did
not break the rendering because we always convert the value to an array
and the `includes` check in the display layer would simply not match,
the field value would be `[null]` which is not expected and strange from
an API perspective.

This commit fixes that by ensuring that we return an empty array in
case the field value is `null`. The API therefore still always gives an
array for the field value, but now the code is more specific so that the
value is either an empty array or an array of strings.
2020-08-19 23:27:50 +02:00
Tim van der Meij
3ca037af78
Combine the choice widget field value unit tests into one parametrized unit test
This commit follows the same pattern as another unit test in this file
and both reduces existing and future code duplication (since the next
commit will extend this test with an additional input).
2020-08-19 23:27:24 +02:00
Tim van der Meij
483efedc66
Include reference tests for text/choice/button widget printing 2020-08-18 12:36:33 +02:00
Tim van der Meij
8ccf09d5dd
Implement reference testing for printing
This commit includes support for rendering pages in printing mode,
which, when combined with annotation storage data, is useful for testing
if form data is correctly rendered onto the printed canvas.
2020-08-18 12:36:33 +02:00
Jonas Jenwald
1058f16605 Add (basic) support for transfer functions to Images (issue 6931, bug 1149713)
This is *similar* to the existing transfer function support for SMasks, but extended to simple image data.
Please note that the extra amount of data now being sent to the worker-thread, for affected /ExtGState entries, is limited to *at most* 4 `Uint8Array`s each with a length of 256 elements.

Refer to https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G9.1658137 for additional details.
2020-08-17 10:34:12 +02:00
Calixte Denizet
1a6816ba98 Add support for saving forms 2020-08-12 10:32:59 +02:00
Brendan Dahl
7fb01f9f2a
Merge pull request #12186 from brendandahl/loca-2
Fix bad truetype loca tables.
2020-08-10 20:34:19 -07:00
Brendan Dahl
f6dff81223 Fix bad truetype loca tables.
Some fonts have loca tables that aren't sorted or use 0 as an offset to
signal a missing glyph. This fixes the bad loca tables by sorting them
and then rewriting the loca table and potentially re-ordering the glyf
table to match.

Fixes #11131 and bug 1650302.
2020-08-10 14:15:49 -07:00
Calixte Denizet
88b112ab0c Support comb textfields for printing 2020-08-09 14:41:26 +02:00
Tim van der Meij
b061c300b4
Merge pull request #12176 from calixteman/multiline
Support multiline textfields for printing
2020-08-09 13:37:36 +02:00
Calixte Denizet
cd8bb7293b Support multiline textfields for printing 2020-08-09 12:14:34 +02:00
Tim van der Meij
ea29d4e0d4
Merge pull request #12187 from Snuffleupagus/issue-4398-test
Add a test-case for issue 4398
2020-08-08 17:45:44 +02:00
Jonas Jenwald
128e41d6be Add a test-case for issue 4398
Issue 4398 was fixed by PR 4437, however a test-case wasn't included as far as I can tell. Given that PR 12186 is now in the process of re-factoring that code, adding a test-case cannot hurt as far as I'm concerned.
2020-08-08 11:50:19 +02:00
Jonathan Grimes
ac723a1760 Allow loading pdf fonts into another document. 2020-08-08 02:52:32 +00:00
Calixte Denizet
1747d259f9 Support textfield and choice widgets for printing 2020-08-06 14:45:23 +02:00
Jonas Jenwald
13e44c0776 Attempt to reduce intermittent failures in the "multiple render() on the same canvas" unit-test
This patch should *hopefully* remove the intermittent unit-test failure, by using the *same* `optionalContentConfigPromise` for both `renderTask`s and thus get more predictable timing behaviour.
2020-08-04 22:31:24 +02:00
Brendan Dahl
ac494a2278 Add support for optional marked content.
Add a new method to the API to get the optional content configuration. Add
a new render task param that accepts the above configuration.
For now, the optional content is not controllable by the user in
the viewer, but renders with the default configuration in the PDF.

All of the test files added exhibit different uses of optional content.

Fixes #269.

Fix test to work with optional content.

- Change the stopAtErrors test to ensure the operator list has something,
  instead of asserting the exact number of operators.
2020-08-04 09:26:55 -07:00
Tim van der Meij
00a8b42e67
Merge pull request #12102 from ineiti/add_types_annotations
Add types annotations
2020-08-02 16:45:37 +02:00
Tim van der Meij
5a66c56eca
Merge pull request #12108 from calixteman/radio
Add support for radios printing
2020-08-02 14:47:46 +02:00
Tim van der Meij
b789a0e216
Log the total number of tests and the random seed in the test runner
This might make debugging intermittent failures a bit easier in the
future because it allows us to spot unexpected differences in the number
of tests being run and allows us to run the tests locally in the same
order in case of intermittent failures.
2020-08-01 21:09:01 +02:00
Tim van der Meij
662ac5548f
Log suite start failures in the test runner 2020-08-01 21:02:20 +02:00
Tim van der Meij
c19d76f9b8
Use a for...of loop in the specDone handler in the test reporter
Moreover, remove a left-over reference to `test.py` since that was
ported to JavaScript a long time ago.
2020-08-01 20:50:30 +02:00
Jonas Jenwald
05baa4c89f
Revert "[api-minor] Allow loading pdf fonts into another document." 2020-08-01 12:52:39 +02:00
Tim van der Meij
173b92a873
Merge pull request #12131 from jsg2021/issue-8271
[api-minor] Allow loading pdf fonts into another document.
2020-08-01 01:13:41 +02:00
Jonathan Grimes
9b16b8ef71
Allow loading pdf fonts into another document. 2020-07-31 11:41:48 -05:00
Jonas Jenwald
346afd1e1c [api-minor] Fix the AnnotationStorage usage properly in the viewer/tests (PR 12107 and 12143 follow-up)
*The [api-minor] label probably ought to have been added to the original PR, given the changes to the `createAnnotationLayerBuilder` signature (if nothing else).*

This patch fixes the following things:
 - Let the `AnnotationLayer.render` method create an `AnnotationStorage`-instance if none was provided, thus making the parameter *properly* optional. This not only fixes the reference tests, it also prevents issues when the viewer components are used.
 - Stop exporting `AnnotationStorage` in the official API, i.e. the `src/pdf.js` file, since it's no longer necessary given the change above. Generally speaking, unless absolutely necessary we probably shouldn't export unused things in the API.
 - Fix a number of JSDocs `typedef`s, in `src/display/` and `web/` code, to actually account for the new `annotationStorage` parameter.
 - Update `web/interfaces.js` to account for the changes in `createAnnotationLayerBuilder`.
 - Initialize the storage, in `AnnotationStorage`, using `Object.create(null)` rather than `{}` (which is the PDF.js default).
2020-07-31 16:32:46 +02:00
Calixte Denizet
f22e702ecc Amend test for checkboxes printing to test the unchecked appearance 2020-07-31 14:39:11 +02:00
Calixte Denizet
538017f7a7 Add support for radios printing 2020-07-31 14:31:49 +02:00
Aki Sasaki
7bb65bab7f fix reftests after #12107
The f1040-annotations reftest started hanging after #12107. We traced
this to `TypeError: can't access property "getOrCreateValue", storage is
undefined`.

We essentially need to add `annotationStorage` to the parameters in
test/driver.js.
2020-07-30 12:25:27 -07:00
Linus Gasser
f1bbfdc16d Add typescript definitions
This PR adds typescript definitions from the JSDoc already present.
It adds a new gulp-target 'types' that calls 'tsc', the typescript
compiler, to create the definitions.

To use the definitions, users can simply do the following:

```
import {getDocument, GlobalWorkerOptions} from "pdfjs-dist";
import pdfjsWorker from "pdfjs-dist/build/pdf.worker.entry";
GlobalWorkerOptions.workerSrc = pdfjsWorker;

const pdf = await getDocument("file:///some.pdf").promise;
```

Co-authored-by: @oBusk
Co-authored-by: @tamuratak
2020-07-30 11:10:37 +02:00
Tim van der Meij
eb4d6a0652
Merge pull request #12107 from calixteman/checkbox
Add support for checkboxes printing
2020-07-30 00:11:41 +02:00
Calixte Denizet
cb60523a15 Add support for checkboxes printing 2020-07-29 16:42:57 +02:00
Tim van der Meij
65e76a3c6b
Fix a bug in the temporary folder check in the test runner
The `noPrompt` option doesn't exist and should be `noPrompts`.
2020-07-26 20:41:19 +02:00
Tim van der Meij
01e2610cf4
Merge pull request #12126 from Snuffleupagus/unittest-shall_fail_cleanup
Attempt to reduce intermittent failures in the "cleans up document resources during rendering of page" unit-test
2020-07-26 14:33:12 +02:00
Jonas Jenwald
86a8fd9810 Attempt to reduce intermittent failures in the "cleans up document resources during rendering of page" unit-test
This patch should *hopefully* remove the `Unhandled promise rejection: ...` errors, by returning the "final" promise. Also, by pausing/delaying of rendering slightly the likelihood of the test failing in the first place should thus be reduced.
2020-07-26 14:05:46 +02:00
Jonas Jenwald
e4ad91be05 Include the browser name when printing unit-test results
This uses a similar format to the reference-test logging, and will help determine in *exactly* which browser the failure occurred (since the tests run concurrently).
2020-07-26 12:54:16 +02:00
Calixte Denizet
584902dbf8 Add an annotation storage in order to save annotation data in acroforms 2020-07-24 10:50:11 +02:00
Jonas Jenwald
ea8e432c45 Add a getRawValues method, to Dict instances, to provide an easier way of getting all *raw* values
When the old `Dict.getAll()` method was removed, it was replaced with a `Dict.getKeys()` call and `Dict.get(...)` calls (in a loop).
While this pattern obviously makes a lot of sense in many cases, there's some instances where we actually want the *raw* `Dict` values (i.e. `Ref`s where applicable). In those cases, `Dict.getRaw(...)` calls are instead used within the loop. However, by introducing a new `Dict.getRawValues()` method we can reduce the number of (strictly unnecessary) function calls by simply getting the *raw* `Dict` values directly.
2020-07-17 16:32:00 +02:00
Jonas Jenwald
6381b5b08f Add a size getter, to Dict instances, to provide an easier way of checking the number of entries
This removes the need to manually call `Dict.getKeys()` and check its length.
2020-07-17 16:06:11 +02:00
Tim van der Meij
e63d1ebff5
Merge pull request #12087 from Snuffleupagus/LocalGStateCache
Add local caching of "simple" Graphics State (ExtGState) data in `PartialEvaluator.{getOperatorList, getTextContent}` (issue 2813)
2020-07-17 16:02:45 +02:00
Tim van der Meij
29adbb7cd7
Implement unit tests for the RefSetCache primitive
This primitive did not have unit test coverage yet, which is important
for upcoming refactoring of the primitive.
2020-07-17 13:35:29 +02:00
Jonas Jenwald
90eb579713 Add local caching of "simple" Graphics State (ExtGState) data in PartialEvaluator.getOperatorList (issue 2813)
This patch will help pathological cases the most, with issue 2813 being a particularily problematic example. While there's only *four* `/ExtGState` resources, there's a total `29062` of `setGState` operators. Even though parsing of a single `/ExtGState` resource is quite fast, having to re-parse them thousands of times does add up quite significantly.

For simplicity we'll only cache "simple" `/ExtGState` resource, since e.g. the general `SMask` case cannot be easily cached (without re-factoring other code, which may have undesirable effects on general parsing).

By caching "simple" `/ExtGState` resource, we thus improve performance by:
 - Not having to fetch/validate/parse the same `/ExtGState` data over and over.
 - Handling of repeated `setGState` operators becomes *synchronous* during the `OperatorList` building, instead of having to defer to the event-loop/microtask-queue since the `/ExtGState` parsing is done asynchronously.

---

Obviously I had intended to include (standard) benchmark results with this patch, but for reasons I don't understand the test run-time (even with `master`) of the document in issue 2813 is *a lot* slower than in the development viewer (making normal benchmarking infeasible).
However, testing this manually in the development viewer (using `pdfBug=Stats`) shows a *reduction* of `~10 %` in the rendering time of the PDF document in issue 2813.
2020-07-14 10:34:43 +02:00
Jonas Jenwald
d4d7ac1b88 Stop special-casing the (very unlikely) "no /XObject found"-scenario, when parsing OPS.paintXObject operators, in PartialEvaluator.{getOperatorList, getTextContent}
Originally there weren't any (generally) good ways to handle errors gracefully, on the worker-side, however that's no longer the case and we can simply fallback to the existing `ignoreErrors` functionality instead.
Also, please note that the "no `/XObject` found"-scenario should be *extremely* unlikely in practice and would only occur in corrupt/broken documents.

Note that the `PartialEvaluator.getOperatorList` case is especially bad currently, since we'll simply (attempt to) send the data as-is to the main-thread. This is quite bad, since in a corrupt/broken document the data *could* contain anything and e.g. be unclonable (which would cause breaking errors).
Also, we're (obviously) not attempting to do anything with this "raw" `OPS.paintXObject` data on the main-thread and simply ensuring that we never send it definately seems like the correct approach.
2020-07-12 21:59:59 +02:00
Tim van der Meij
7dabc5ecc8
Merge pull request #12063 from Snuffleupagus/issue-10989
Tweak the heuristic, in `src/core/jpg.js`, that handles JPEG images with a wildly incorrect SOF (Start of Frame) `scanLines` parameter (issue 10989)
2020-07-11 00:05:11 +02:00
Jonas Jenwald
d18cf47419 Remove the special handling, used when creating Indexed ColorSpaces, for the case where the lookup-data is a Stream
This special-case was added in PR 1992, however it became unnecessary with the changes in PR 4824 since all of the ColorSpace parsing is now done on the worker-thread (with only RGB-data being sent to the main-thread).
2020-07-10 17:22:55 +02:00
Jonas Jenwald
4cc6797f17 Re-factor the idFactory functionality, used in the core/-code, and move the fontID generation into it
Note how the `getFontID`-method in `src/core/fonts.js` is *completely* global, rather than properly tied to the current document. This means that if you repeatedly open and parse/render, and then close, even the *same* PDF document the `fontID`s will still be incremented continuously.

For comparison the `createObjId` method, on `idFactory`, will always create a *consistent* id, assuming of course that the document and its pages are parsed/rendered in the same order.

In order to address this inconsistency, it thus seems reasonable to add a new `createFontId` method on the `idFactory` and use that when obtaining `fontID`s. (When the current `getFontID` method was added the `idFactory` didn't actually exist yet, which explains why the code looks the way it does.)
*Please note:* Since the document id is (still) part of the `loadedName`, it's thus not possible for different documents to have identical font names.
2020-07-07 16:33:31 +02:00
Jonas Jenwald
1d66fce781 Tweak the heuristic, in src/core/jpg.js, that handles JPEG images with a wildly incorrect SOF (Start of Frame) scanLines parameter (issue 10989) 2020-07-06 13:06:49 +02:00
Jonas Jenwald
4a7e29865d [api-minor] Use the NodeCanvasFactory/NodeCMapReaderFactory classes as defaults in Node.js environments (issue 11900)
This moves, and slightly simplifies, code that's currently residing in the unit-test utils into the actual library, such that it's bundled with `GENERIC`-builds and used in e.g. the API-code.

As an added bonus, this also brings out-of-the-box support for CMaps in e.g. the Node.js examples.
2020-07-02 04:44:23 +02:00
Brendan Dahl
fe3df495cc
Merge pull request #12040 from wojtekmaj/replace-non-inclusive
Replace non-inclusive "whitelist" term with "allowlist"
2020-07-01 15:41:41 -07:00
Jonas Jenwald
fef24658e7 Adjust the heuristics used when dealing with rectangles, i.e. re operators, with zero width/height (issue 12010) 2020-07-02 00:02:49 +02:00
Tim van der Meij
75fed02630
Merge pull request #12043 from Snuffleupagus/issue-4260-test
Add a reduced test-case for issue 4260 (PR 4521 follow-up)
2020-07-01 23:51:21 +02:00
Jonas Jenwald
e451cabe37 Add a reduced test-case for issue 4260 (PR 4521 follow-up) 2020-06-30 09:26:41 +02:00
Wojciech Maj
78970bbbe1
Replace non-inclusive "whitelist" term with "allowlist" 2020-06-29 17:15:14 +02:00
Jonas Jenwald
4a5b68e077 Add at least *some* test-coverage for the RenderTask.onContinue functionality
The default viewer, and thus Firefox, depends on the `RenderTask.onContinue` functionality to pause/continue rendering (such that the most visible page always renders first).
Despite this functionality thus being very important, it has however never actually been tested *at all* as far as I can tell. Hence this patch which adds a new boolean `renderTaskOnContinue` parameter (`false` by default), that can be used to force a reference-test to use the `RenderTask.onContinue` code-path in the `InternalRenderTask` class.

Note that I purposely made this new reference-test behaviour *optional*, since I didn't want to negatively affect the general runtime of the tests (given that there's a slight delay added to the rendering). Also, for e.g. benchmarking you'd most likely want to stay away from the `RenderTask.onContinue` functionality for similar reasons.
2020-06-29 00:38:34 +02:00
Jonas Jenwald
28d2ada59c Attempt to detect inline images which contain "EI" sequence in the actual image data (issue 11124)
This should reduce the possibility of accidentally truncating some inline images, while *not* causing the "EI" detection to become significantly slower.[1]
There's obviously a possibility that these added checks are not sufficient to catch *every* single case of "EI" sequences within the actual inline image data, but without specific test-cases I decided against over-engineering the solution here.

*Please note:* The interpolation issues are somewhat orthogonal to the main issue here, which is the truncated image, and it's already tracked elsewhere.

---
[1] I've looked at the issue a few times, and this is the first approach that I was able to come up with that didn't cause *unacceptable* performance regressions in e.g. issue 2618.
2020-06-26 13:15:06 +02:00
Jonas Jenwald
19d7976483 Improve (local) caching of parsed ColorSpaces (PR 12001 follow-up)
This patch contains the following *notable* improvements:
 - Changes the `ColorSpace.parse` call-sites to, where possible, pass in a reference rather than actual ColorSpace data (necessary for the next point).
 - Adds (local) caching of `ColorSpace`s by `Ref`, when applicable, in addition the caching by name. This (generally) improves `ColorSpace` caching for e.g. the SMask code-paths.
 - Extends the (local) `ColorSpace` caching to also apply when handling Images and Patterns, thus further reducing unneeded re-parsing.
 - Adds a new `ColorSpace.parseAsync` method, almost identical to the existing `ColorSpace.parse` one, but returning a Promise instead (this simplifies some code in the `PartialEvaluator`).
2020-06-24 23:53:10 +02:00
Jonas Jenwald
e22bc483a5 Re-factor ColorSpace.parse to take a parameter object, rather than a bunch of (randomly) ordered parameters
Given the number of existing parameters, this will avoid needlessly unwieldy call-sites especially with upcoming changes in later patches.
2020-06-24 23:53:10 +02:00
Jonas Jenwald
e18fa3fc45 Tweak the QueueOptimizer to recognize OPS.paintImageMaskXObject operators as *repeated* when the "skew" transformation matrix elements are non-zero (issue 8078)
*First of all, I should mention that my understanding of the finer details of the `QueueOptimizer` (and its related `CanvasGraphics` methods) is somewhat limited.*
Hence I'm not sure if there's actually a very good reason for *only* considering ImageMasks where the "skew" transformation matrix elements are zero as *repeated*, however simply looking at the code I just don't see why these elements cannot be non-zero as long as they are *all identical* for the ImageMasks.
Furthermore, looking at the *group* case (which is what we're currently falling back to), there's no particular limitation placed upon the transformation matrix elements.

While this patch obviously isn't enough to *completely* fix the issue, since there should be a visible Pattern rendered as well[1], it seem (at least to me) like enough of an improvement that submitting this is justified.
With these changes the referenced PDF document will no longer hang the *entire* browser, and rendering also finishes in a *reasonable* time (< 10 seconds for me) which seem fine given the *huge* number of identical inline images present.[2]

---
[1] Temporarily changing the Pattern to a solid color *does* render the correct/expected area, which suggests that the remaining problem is a pre-existing issue related to the Pattern-handling itself rather than the `QueueOptimizer` functionality.

[2] The document isn't exactly rendered immediately in e.g. Adobe Reader either.
2020-06-20 12:18:48 +02:00
Jonas Jenwald
4b51bcc733 Ensure that PDFImage.buildImage won't accidentally swallow errors, e.g. from ColorSpace parsing (issue 6707, PR 11601 follow-up)
Because of a really stupid `Promise`-related mistake on my part, when re-factoring `PDFImage.buildImage` during the `NativeImageDecoder` removal, we're no longer re-throwing errors occuring during image parsing/decoding as intended.
The result is that some (fairly) corrupt documents will never finish loading, and unfortunately there were apparently no sufficiently corrupt images in the test-suite to catch this.
2020-06-13 15:02:37 +02:00
Jonas Jenwald
88fdb482b0 Move the isEmptyObj helper function from src/shared/util.js to test/unit/test_utils.js
Since this helper function is no longer used anywhere in the main code-base, but only in a couple of unit-tests, it's thus being moved to a more appropriate spot.

Finally, the implementation of `isEmptyObj` is also tweaked slightly by removing the manual loop.
2020-06-09 17:50:16 +02:00
Tim van der Meij
550a38f1ba
Improve unit test coverage for primitives
This commit includes unit tests for:

- `isEOF`
- `isStream`
- `Ref`'s string representation and caching
- `Dict`'s XRef assignment
2020-06-07 17:31:40 +02:00
Tim van der Meij
2bd0690fdd
Convert var to const/let in test/unit_primitives_spec.js 2020-06-07 15:04:24 +02:00
Carlos Rodríguez
802aa14a99 Jpeg encoded with RGB -instead of YCbCr- write the components index as "RGB" in ASCII to say it so
On ISO/IEC 10918-6:2013 (E), section 6.1: (http://www.itu.int/rec/T-REC-T.872-201206-I/en)

"Images encoded with three components are assumed to be RGB data encoded as YCbCr unless the image contains an APP14 marker segment as specified in 6.5.3, in which case the colour encoding is considered either RGB or YCbCr according to the application data of the APP14 marker segment"

But common jpeg libraries consider RGB too if components index are ASCII R (0x52), G (0x47) and B (0x42): https://stackoverflow.com/questions/50798014/determining-color-space-for-jpeg/50861048

Issue #11931
2020-06-04 15:08:47 +02:00
Tim van der Meij
3b615e4ca3
Merge pull request #11601 from Snuffleupagus/rm-nativeImageDecoderSupport
[api-minor] Decode all JPEG images with the built-in PDF.js decoder in `src/core/jpg.js`
2020-05-23 15:33:46 +02:00
Jonas Jenwald
56ebf01ae0 Avoid hanging the worker-thread for CMap data with ridiculously large ranges (issue 11922)
This patch was inspired by ad2b64f124/xpdf/CharCodeToUnicode.cc (L480-L484)
2020-05-22 15:23:17 +02:00
Jonas Jenwald
0351852d74 [api-minor] Decode all JPEG images with the built-in PDF.js decoder in src/core/jpg.js
Currently some JPEG images are decoded by the built-in PDF.js decoder in `src/core/jpg.js`, while others attempt to use the browser JPEG decoder. This inconsistency seem unfortunate for a number of reasons:

 - It adds, compared to the other image formats supported in the PDF specification, a fair amount of code/complexity to the image handling in the PDF.js library.

 - The PDF specification support JPEG images with features, e.g. certain ColorSpaces, that browsers are unable to decode natively. Hence, determining if a JPEG image is possible to decode natively in the browser require a non-trivial amount of parsing. In particular, we're parsing (part of) the raw JPEG data to extract certain marker data and we also need to parse the ColorSpace for the JPEG image.

 - While some JPEG images may, for all intents and purposes, appear to be natively supported there's still cases where the browser may fail to decode some JPEG images. In order to support those cases, we've had to implement a fallback to the PDF.js JPEG decoder if there's any issues during the native decoding. This also means that it's no longer possible to simply send the JPEG image to the main-thread and continue parsing, but you now need to actually wait for the main-thread to indicate success/failure first.
   In practice this means that there's a code-path where the worker-thread is forced to wait for the main-thread, while the reverse should *always* be the case.

 - The native decoding, for anything except the *simplest* of JPEG images, result in increased peak memory usage because there's a handful of short-lived copies of the JPEG data (see PR 11707).
Furthermore this also leads to data being *parsed* on the main-thread, rather than the worker-thread, which you usually want to avoid for e.g. performance and UI-reponsiveness reasons.

 - Not all environments, e.g. Node.js, fully support native JPEG decoding. This has, historically, lead to some issues and support requests.

 - Different browsers may use different JPEG decoders, possibly leading to images being rendered slightly differently depending on the platform/browser where the PDF.js library is used.

Originally the implementation in `src/core/jpg.js` were unable to handle all of the JPEG images in the test-suite, but over the last couple of years I've fixed (hopefully) all of those issues.
At this point in time, there's two kinds of failure with this patch:

 - Changes which are basically imperceivable to the naked eye, where some pixels in the images are essentially off-by-one (in all components), which could probably be attributed to things such as different rounding behaviour in the browser/PDF.js JPEG decoder.
   This type of "failure" accounts for the *vast* majority of the total number of changes in the reference tests.

 - Changes where the JPEG images now looks *ever so slightly* blurrier than with the native browser decoder. For quite some time I've just assumed that this pointed to a general deficiency in the `src/core/jpg.js` implementation, however I've discovered when comparing two viewers side-by-side that the differences vanish at higher zoom levels (usually around 200% is enough).
   Basically if you disable [this downscaling in canvas.js](8fb82e939c/src/display/canvas.js (L2356-L2395)), which is what happens when zooming in, the differences simply vanish!
   Hence I'm pretty satisfied that there's no significant problems with the `src/core/jpg.js` implementation, and the problems are rather tied to the general quality of the downscaling algorithm used. It could even be seen as a positive that *all* images now share the same downscaling behaviour, since this actually fixes one old bug; see issue 7041.
2020-05-22 00:22:48 +02:00
Jonas Jenwald
dda6626f40 Attempt to cache repeated images at the document, rather than the page, level (issue 11878)
Currently image resources, as opposed to e.g. font resources, are handled exclusively on a page-specific basis. Generally speaking this makes sense, since pages are separate from each other, however there's PDF documents where many (or even all) pages actually references exactly the same image resources (through the XRef table). Hence, in some cases, we're decoding the *same* images over and over for every page which is obviously slow and wasting both CPU and memory resources better used elsewhere.[1]

Obviously we cannot simply treat all image resources as-if they're used throughout the entire PDF document, since that would end up increasing memory usage too much.[2]
However, by introducing a `GlobalImageCache` in the worker we can track image resources that appear on more than one page. Hence we can switch image resources from being page-specific to being document-specific, once the image resource has been seen on more than a certain number of pages.

In many cases, such as e.g. the referenced issue, this patch will thus lead to reduced memory usage for image resources. Scrolling through all pages of the document, there's now only a few main-thread copies of the same image data, as opposed to one for each rendered page (i.e. there could theoretically be *twenty* copies of the image data).
While this obviously benefit both CPU and memory usage in this case, for *very* large image data this patch *may* possibly increase persistent main-thread memory usage a tiny bit. Thus to avoid negatively affecting memory usage too much in general, particularly on the main-thread, the `GlobalImageCache` will *only* cache a certain number of image resources at the document level and simply fallback to the default behaviour.

Unfortunately the asynchronous nature of the code, with ranged/streamed loading of data, actually makes all of this much more complicated than if all data could be assumed to be immediately available.[3]

*Please note:* The patch will lead to *small* movement in some existing test-cases, since we're now using the built-in PDF.js JPEG decoder more. This was done in order to simplify the overall implementation, especially on the main-thread, by limiting it to only the `OPS.paintImageXObject` operator.

---
[1] There's e.g. PDF documents that use the same image as background on all pages.

[2] Given that data stored in the `commonObjs`, on the main-thread, are only cleared manually through `PDFDocumentProxy.cleanup`. This as opposed to data stored in the `objs` of each page, which is automatically removed when the page is cleaned-up e.g. by being evicted from the cache in the default viewer.

[3] If the latter case were true, we could simply check for repeat images *before* parsing started and thus avoid handling *any* duplicate image resources.
2020-05-21 18:13:45 +02:00
Jonas Jenwald
e1f340a0c2 Use the ESLint no-restricted-syntax rule to ensure that assert is always called with two arguments
Having `assert` calls without a message string isn't very helpful when debugging, and it turns out that it's easy enough to make use of ESLint to enforce better `assert` call-sites.
In a couple of cases the `assert` calls were changed to "regular" throwing of errors instead, since that seemed more appropriate.

Please find additional details about the ESLint rule at https://eslint.org/docs/rules/no-restricted-syntax
2020-05-05 13:40:05 +02:00
Jonas Jenwald
4aabd063fc Gracefully handle annotation parsing errors in Page.getOperatorList (issue 11871)
This should ensure that a page will always render successfully, even if there's errors during the Annotation fetching/parsing.
Additionally the `OperatorList.addOpList` method is also adjusted to ignore invalid data, to make it slightly more robust.
2020-05-04 17:09:48 +02:00
Tim van der Meij
9ebb18f505
Implement a command line flag to skip Chrome when running tests
To save time or resources during development it can be useful to run
tests only in Firefox. Previously this could be done by editing the
browser manifest file, but since that file is no longer used for
Puppeteer, this command line flag replaces it. For example, executing
`gulp unittest --noChrome` will only run the unit tests in Firefox.
2020-04-27 13:03:12 +02:00
Tim van der Meij
4834a276fd
Introduce Puppeteer for handling browsers during tests
This commit replaces our own infrastructure for handling browsers during
tests with Puppeteer. Using our own infrastructure for this had a few
downsides:

- It has proven to not always be reliable, especially when closing the
  browser, causing failures on the bots because browsers were still
  running even though they should have been stopped. Puppeteer should do
  a better job with this because it uses the browser's test built-in
  instrumentation tools for this (the devtools protocol) which our code
  didn't. This also means that we don't have to pass
  parameters/preferences to tweak browser behavior anymore.
- It requires the browsers under test to be installed on the system,
  whereas Puppeteer downloads the browsers before the test. This means
  that setup is much easier (no more manual installations and browser
  manifest files) as well as testing with different browser versions
  (since they can be provisioned on demand). Moreover, this ensures that
  contributors always run the tests in both Firefox and Chrome,
  regardless of which browsers they have installed locally.
- It's all code we have to maintain, so Puppeteer abstracts away how the
  browsers start/stop for us so we don't have to keep that code.

By default, Puppeteer only installs one browser during installation,
hence the need for a post-install script to install the second browser.
This requires `cross-env` to make passing the environment variable work
on both Linux and Windows.
2020-04-27 13:03:12 +02:00
Tim van der Meij
d86720b7dc
Identify browsers using the name instead of the path
The other testing code already uses the name of the browser as the
unique identifier, so I don't see a good reason to not use that for
identifying browsers to quit as well. Doing so simplifies the (already
somewhat complex) testing logic and ensures that we can use existing
functionality (such as the `getSession` function) to retrieve sessions.
2020-04-26 14:42:17 +02:00
Jonas Jenwald
cdc60402f6 [api-minor] Change PageViewport to throw when the rotation is not a multiple of 90 degrees
As evident from the code, `PageViewport` only supports[1] `rotation` values which are a multiple of 90 degrees. Besides it being somewhat difficult to imagine meaningful use-cases for a non-multiple of 90 degrees `rotation`, the code also becomes both simpler and more efficient by not having to consider arbitrary `rotation` values.

However, any invalid rotation will *silently* fallback to assume zero `rotation` which probably isn't great for e.g. `PDFPageProxy.getViewport` in the API. Hence this patch, which will now enforce that only valid `rotation` values are accepted.

---
[1] As far as I can tell, from looking through the history, nothing else has ever been supported either.
2020-04-22 15:19:13 +02:00
Jonas Jenwald
1cc3dbb694 Enable the dot-notation ESLint rule
*Please note:* These changes were done automatically, using the `gulp lint --fix` command.

This rule is already enabled in mozilla-central, see https://searchfox.org/mozilla-central/rev/567b68b8ff4b6d607ba34a6f1926873d21a7b4d7/tools/lint/eslint/eslint-plugin-mozilla/lib/configs/recommended.js#103-104

The main advantage, besides improved consistency, of this rule is that it reduces the size of the code (by 3 bytes for each case). In the PDF.js code-base there's close to 8000 instances being fixed by the `dot-notation` ESLint rule, which end up reducing the size of even the *built* files significantly; the total size of the `gulp mozcentral` build target changes from `3 247 456` to `3 224 278` bytes, which is a *reduction* of `23 178` bytes (or ~0.7%) for a completely mechanical change.

A large number of these changes affect the (large) lookup tables used on the worker-thread, but given that they are still initialized lazily I don't *think* that the new formatting this patch introduces should undo any of the improvements from PR 6915.

Please find additional details about the ESLint rule at https://eslint.org/docs/rules/dot-notation
2020-04-17 12:24:46 +02:00
Tim van der Meij
96923eb2a6
Merge pull request #11805 from Snuffleupagus/issue-11794
Always skip over any additional, unexpected, RSTx (restart) markers in corrupt JPEG images (issue 11794)
2020-04-16 00:08:58 +02:00
Tim van der Meij
a7def05aa1
Merge pull request #11810 from Snuffleupagus/fromCodePoint-followup
A couple of small `String.fromCodePoint` improvements (PR 11698 and 11769 follow-up)
2020-04-16 00:08:16 +02:00
Jonas Jenwald
44b4a74f48 A couple of small String.fromCodePoint improvements (PR 11698 and 11769 follow-up)
- Add a reduced test-case for issue 11768, to prevent future regressions.
   (Given that PR 11769 is only a work-around, rather than a proper solution, it may not be entirely accurate for the issue to be closed as fixed.)

 - Add more validation of the charCode, as found by the heuristics, in `PartialEvaluator._buildSimpleFontToUnicode` to prevent future issues.
2020-04-15 13:45:08 +02:00
Jonas Jenwald
06f6f8719f Always skip over any additional, unexpected, RSTx (restart) markers in corrupt JPEG images (issue 11794) 2020-04-14 23:27:08 +02:00
Jonas Jenwald
746eaf3154 [api-minor] Fix the return value of PDFDocumentProxy.getViewerPreferences when no viewer preferences are present (PR 10738 follow-up)
This patch fixes yet another instalment in the never-ending series of "what the *bleep* was I thinking", by changing the `PDFDocumentProxy.getViewerPreferences` method to return `null` by default.
Not only is this method now consistent with many other API methods, for the data not present case, but it also avoids having to e.g. loop through an object to check if it's actually empty (note the old unit-test).
2020-04-14 23:25:50 +02:00
Jonas Jenwald
426945b480 Update Prettier to version 2.0
Please note that these changes were done automatically, using `gulp lint --fix`.

Given that the major version number was increased, there's a fair number of (primarily whitespace) changes; please see https://prettier.io/blog/2020/03/21/2.0.0.html
In order to reduce the size of these changes somewhat, this patch maintains the old "arrowParens" style for now (once mozilla-central updates Prettier we can simply choose the same formatting, assuming it will differ here).
2020-04-14 12:28:14 +02:00
Jonas Jenwald
91efde5246 Add a heuristic to scale even single-char text, when the horizontal/vertical scaling differs significantly (issue 11713)
At this point in time, compared to when the "ignore single-char" code was added, we *should* generally be doing a much better job of combining text into as few chunks as possible.
However, there's still bad cases where we're not able to combine text as much as one would like, which is why I'm *not* proposing to simply measure/scale all text. Instead this patch will to only measure/scale single-char text in cases where the horizontal/vertical scale is off significantly, since that's were you'd expect bad text-selection behaviour otherwise.

Note that most of the movement caused by this patch is with Type3 fonts, which is a somewhat special font type and one where our current text-selection behaviour is probably the least good.
2020-04-07 00:36:23 +02:00
Jonas Jenwald
938d519192 Create the glyph mapping correctly for composite Type1, i.e. CIDFontType0, fonts (issue 11740)
This updates `Type1Font.getGlyphMapping` with a code-path "borrowed" from `CFFFont.getGlyphMapping`.
2020-04-06 11:21:02 +02:00
Jonas Jenwald
710704508c Fail early, in modern GENERIC builds, if certain required browser functionality is missing (issue 11762)
With two kind of builds now being produced, with/without translation/polyfills, it's unfortunately somewhat easy for users to accidentally pick the wrong one.

In the case where a user would attempt to use a modern build of PDF.js in an older browser, such as e.g. IE11, the failure would be immediate when the code is loaded (given the use of unsupported ECMAScript features).
However in some browsers/environments, in particular Node.js, a modern PDF.js build may load correctly and thus *appear* to function, only to fail for e.g. certain API calls. To hopefully lessen the support burden, and to try and improve things overall, this patch adds checks to ensure that a modern build of PDF.js cannot be used in browsers/environments which lack native support for critical functionality (such as e.g. `ReadableStream`). Hence we'll fail early, with an error message telling users to pick an ES5-compatible build instead.

To ensure that we actually test things better especially w.r.t. usage of the PDF.js library in Node.js environments, the `gulp npm-test` task as used by Node.js/Travis was changed (back) to test an ES5-compatible build.
(Since the bots still test the code as-is, without transpilation/polyfills, this shouldn't really be a problem as far as I can tell.)
As part of these changes there's now both `gulp lib` and `gulp lib-es5` build targets, similar to e.g. the generic builds, which thanks to some re-factoring only required adding a small amount of code.

*Please note:* While it's probably too early to tell if this will be a widespread issue, it's possible that this is the sort of patch that *may* warrant being `git cherry-pick`ed onto the current beta version (v2.4.456).
2020-04-01 19:42:48 +02:00
Jonas Jenwald
664b79abe0 [api-minor] Remove the eventBusDispatchToDOM option/preference, and thus the general ability to dispatch "viewer components" events to the DOM
This functionality was only added to the default viewer for backwards compatibility and to support the various PDF viewer tests in mozilla-central, with the intention to eventually remove it completely.
While the different mozilla-central tests cannot be *easily* converted from DOM events, it's however possible to limit that functionality to only MOZCENTRAL builds *and* when tests are running.

Rather than depending of the re-dispatching of internal events to the DOM, the default viewer can instead be used in e.g. the following way:
```javascript
document.addEventListener("webviewerloaded", function() {
  PDFViewerApplication.initializedPromise.then(function() {
    // The viewer has now been initialized, and its properties can be accessed.

    PDFViewerApplication.eventBus.on("pagerendered", function(event) {
      console.log("Has rendered page number: " + event.pageNumber);
    });
  });
});
```
2020-03-29 12:24:46 +02:00
Jonas Jenwald
fdfcde2b40 Remove a spurious console.log from the ChromiumBrowser function in test/webbrowser.js file
This looks entirely like something which was left-over from debugging, and that line hasn't been touched since PR 4515, especially considering that the corresponding branch in `FirefoxBrowser` doesn't print anything.
2020-03-25 11:57:12 +01:00
Jonas Jenwald
dcb16af968 Whitelist closure related cases to address the remaining no-shadow linting errors
Given the way that "classes" were previously implemented in PDF.js, using regular functions and closures, there's a fair number of false positives when the `no-shadow` ESLint rule was enabled.

Note that while *some* of these `eslint-disable` statements can be removed if/when the relevant code is converted to proper `class`es, we'll probably never be able to get rid of all of them given our naming/coding conventions (however I don't really see this being a problem).
2020-03-25 11:57:12 +01:00
Jonas Jenwald
1d2f787d6a Enable the ESLint no-shadow rule
This rule is *not* currently enabled in mozilla-central, but it appears commented out[1] in the ESLint definition file; see https://searchfox.org/mozilla-central/rev/c80fa7258c935223fe319c5345b58eae85d4c6ae/tools/lint/eslint/eslint-plugin-mozilla/lib/configs/recommended.js#238-239

Unfortunately this rule is, for fairly obvious reasons, impossible to `--fix` automatically (even partially) and each case thus required careful manual analysis.
Hence this ESLint rule is, by some margin, probably the most difficult one that we've enabled thus far. However, using this rule does seem like a good idea in general since allowing variable shadowing could lead to subtle (and difficult to find) bugs or at the very least confusing code.

Please find additional details about the ESLint rule at https://eslint.org/docs/rules/no-shadow

---
[1] Most likely, a very large number of lint errors have prevented this rule from being enabled thus far.
2020-03-25 11:56:05 +01:00
Tim van der Meij
475fa1f97f
Merge pull request #11744 from janpe2/cff-glyph-zero
The first glyph in CFF CIDFonts must be named 0 instead of ".notdef"
2020-03-24 23:52:21 +01:00
Jani Pehkonen
a22c0eab48 The first glyph in CFF CIDFonts must be named 0 instead of ".notdef"
Fixes #11718 in which the `ff` ligature glyph is at index zero in a CFF font. Beacuse this is a CIDFont, glyph names are CIDs, which are integers. Thus the string `".notdef"` is not correct. The rest of the charset data is already parsed correctly as integers when the boolean argument `cid` is true.
2020-03-24 15:56:50 +02:00
Jonas Jenwald
66ee8f5acd Remove variable shadowing from the JavaScript files in the test/unit/ folder
*This is part of a series of patches that will try to split PR 11566 into smaller chunks, to make reviewing more feasible.*

Once all the code has been fixed, we'll be able to eventually enable the ESLint no-shadow rule; see https://eslint.org/docs/rules/no-shadow
2020-03-24 10:44:17 +01:00