Currently these methods accept a large number of parameters, which creates quite unwieldy call-sites. When invoking them, you have to remember not only what arguments to supply, but also the correct order, to avoid runtime errors.
Furthermore, since some of the parameters are optional, you also have to remember to pass e.g. `null` or `undefined` for those ones.
Also, adding new parameters to these methods (which happens occasionally), often becomes unnecessarily tedious (based on personal experience).
Please note that I do *not* think that we need/should convert *every* single method in `evaluator.js` (or elsewhere in `/core` files) to take parameter objects. However, in my opinion, once a method starts relying on approximately five parameter (or even more), passing them in individually becomes quite cumbersome.
With these changes, I obviously needed to update the `evaluator_spec.js` unit-tests. The main change there, except the new method signatures[1], is that it's now re-using *one* `PartialEvalutor` instance, since I couldn't see any compelling reason for creating a new one in every single test.
*Note:* If this patch is accepted, my intention is to (time permitting) see if it makes sense to convert additional methods in `evaluator.js` (and other `/core` files) in a similar fashion, but I figured that it'd be a good idea to limit the initial scope somewhat.
---
[1] A fun fact here, note how the `PartialEvaluator` signature used in `evaluator_spec.js` wasn't even correct in the current `master`.
Please see http://eslint.org/docs/rules/object-shorthand.
Unfortunately, based on commit 9276d1dcd9, it seems that we still need to maintain compatibility with old Node.js versions, hence certain files/directories that are executed in Node.js are currently exempt from this rule.
Furthermore, since the files specific to the Chromium extension are not run through Babel, the `/extensions/chromium/` directory is also exempt from this rule.
The test runner is automated, so if the default browser test is
performed, the browser hangs waiting for user input it never gets.
Disable the test to fix that.
Moreover, enable E10s now that it is mature. This may help with the
performance of the test runner as well.
Please refer to the JBIG2 standard, see https://www.itu.int/rec/dologin_pub.asp?lang=e&id=T-REC-T.88-200002-I!!PDF-E&type=items.
In particular, section "6.3.5.3 Fixed templates and adaptive templates" mentions that the offsets should be *subtracted*; where the offsets are defined according to "Table 6" under section "6.3.2 Input parameters".
Fixes 7145.
Fixes 7308.
Fixes 7401.
Fixes 7850.
Fixes 8270.
With the exception of just one test-case, all the current `ui_utils` unit-tests can run successfully on Node.js (since most of them doesn't rely on the DOM).
To get this working, I had to first of all add a new `LIB` build flag such that `gulp lib` produces a `web/pdfjs.js` file that is able to load `pdf.js` successfully.
Second of all, since neither `document` nor `navigator` is available in Node.js, `web/ui_utils.js` was adjusted slightly to avoid errors.
The patch also changes the `defaultFilename` to use the ES6 default parameter notation, and fixes the formatting of the JSDoc comment.
Finally, since `getPDFFileNameFromURL` currently has no unit-tests, a few basic ones are added to avoid regressions.
[api-minor] Always allow e.g. rendering to continue even if there are errors, and add a `stopAtErrors` parameter to `getDocument` to opt-out of this behaviour (issue 6342, issue 3795, bug 1130815)
This patch implements support for line annotations. Other viewers only
show the popup annotation when hovering over the line, which may have
any orientation. To make this possible, we render an invisible line (SVG
element) over the line on the canvas that acts as the trigger for the
popup annotation. This invisible line has the same starting coordinates,
ending coordinates and width of the line on the canvas.
Other PDF readers, e.g. Adobe Reader and PDFium (in Chrome), will attempt to render as much of a page as possible even if there are errors present.
Currently we just bail as soon the first error is hit, which means that we'll usually not render anything in these cases and just display a blank page instead.
NOTE: This patch changes the default behaviour of the PDF.js API to always attempt to recover as much data as possible, even when encountering errors during e.g. `getOperatorList`/`getTextContent`, which thus improve our handling of corrupt PDF files and allow the default viewer to handle errors slightly more gracefully.
In the event that an API consumer wishes to use the old behaviour, where we stop parsing as soon as an error is encountered, the `stopAtErrors` parameter can be set at `getDocument`.
Fixes, inasmuch it's possible since the PDF files are corrupt, e.g. issue 6342, issue 3795, and [bug 1130815](https://bugzilla.mozilla.org/show_bug.cgi?id=1130815) (and probably others too).
Considering how extremely simple this patch turned out to be, I'm almost worried that I completely misunderstood why the current code looks like it does...
I happened to notice that the error handling wasn't that great, which I missed previously since there were no unit-tests for failure to load built-in CMap files.
Hence this patch, which improves the error handling *and* adds tests.
I really cannot understand why this change is necessary, since modern browsers such as Firefox and Chrome work just fine with the old code.
Hence this is patch is yet another "hack" that's needed just because IE apparently cannot just work like you'd expect.
For consistency, the Node factory used in the CMap unit-tests is changed as well.
Fixes 8193.
*My apologies for inadvertently breaking this in PR 8064; apparently we don't have any tests that cover this use-case :(*
Without this patch `getTextContent` will fail if called before `getOperatorList`, since loading of fonts during text-extraction may require fetching of built-in CMap files.
*Please note:* The `text` test added here, which uses an already existing PDF file, fails without this patch.
*After browsing through (a version of) the JPEG specification, see https://www.w3.org/Graphics/JPEG/itu-t81.pdf, I hope that this patch makes sense.*
Note that while issue 7828 became a problem after PR 7661, it isn't really a regression from than PR. The explanation is rather that we're now relying on `core/jpg.js` instead of the Native Image decoder in more situations than before, which thus exposed an *existing* issue in our JPEG decoder.
Another factor also seems to be that in many JPEG images, the DRI (Define Restart Interval) marker isn't present, in which case this bug won't manifest either.
According to https://www.w3.org/Graphics/JPEG/itu-t81.pdf#page=89 (at the bottom of the page):
"NOTE – The final restart interval may be smaller than the size specified by the DRI marker segment, as it includes only the number of MCUs remaining in the scan."
Furthermore, according to https://www.w3.org/Graphics/JPEG/itu-t81.pdf#page=39 (in the middle of the page):
"[...] If restart is enabled and the restart interval is defined to be Ri, each entropy-coded segment except the last one shall contain Ri MCUs. The last one shall contain whatever number of MCUs completes the scan."
Based on the above, it thus seem to me that we should simply ensure that we're not attempting to continue to parse Scan data once we've found all MCUs (Minimum Coded Unit) of the image.
Fixes 7828.
There's still some work necessary if we want to be able to run (even a subset of) the API unit-tests on Travis.
However, this patch could be considered a small first step, since the relevant unit-tests will now rely on a `CanvasFactory` rather than using `document.createElement('canvas')` directly.
This patch gets rid of the only case in the code-base where we're throwing a plain `string`, rather than an `Error`, which besides better/more consistent error handling also allows us to enable the [`no-throw-literal`](http://eslint.org/docs/rules/no-throw-literal) ESLint rule.
It appears that I accidentally broke this in PR 6065, sorry about that!
The issue in this particular PDF file is that there's `/Rotate` entries on different levels of the `/Pages` tree. We're supposed to use the `/Rotate` entry in the `/Page` dict (which is `0`), but because of an incorrect condition we instead ended up with the one from the `/Pages` dict (which is `180`).
Fixes 8125.
Even though the PDF specification does not state that `Opt` fields are
inheritable, in practice there are PDF generators that let annotations
inherit the options from a parent.
As discussed on IRC, we need to check all nodes at the *bottom* of the tree to ensure that we find the correct `Page` dict.
Furthermore, this patch also gets rid of the caching present in a previous version, since it's not clear if that really helps.
Note that this patch purposely adds an `eq` test, using a reduced test-case, so that we can be sure that the algorithm actually finds the correct `Page` dict for each `pageIndex`.
Fixes 8088.
[api-minor] Refactor fetching of built-in CMaps to utilize a factory on the `display` side instead, to allow users of the API to provide a custom CMap loading factory (e.g. for use with Node.js)
Currently the built-in CMap files are loaded in `src/core/cmap.js` using `XMLHttpRequest` directly. For some environments that might be a problem, hence this patch refactors that to instead use a factory to load built-in CMaps on the main thread and message the data to the worker thread.
This is inspired by other recent work, e.g. the addition of the `CanvasFactory`, and to a large extent on the IRC discussion starting at http://logs.glob.uno/?c=mozilla%23pdfjs&s=12+Oct+2016&e=12+Oct+2016#c53010.
This patch basically reverts one aspect of TrueType (3, 1) cmap parsing to the state prior to PR 4259. After that PR, a number of regressions occurred in this particular code-path, which necessitated a number of follow-ups such as PRs 5703, 5743, and 6425.
The empirical data suggests, at least to me, that we should always prefer a (3, 1) cmap for TrueType fonts when they have an encoding, regardless of the Symbolic font flag.
Obviously this patch passes all unit/font/reference tests locally, and I made sure that all the PRs mentioned above landed with test-cases included.
However, in my opinion, there's still a very real possibility that this patch could potentially cause new regressions.
Given that the PDF file in bug 1337429 has been broken for almost *three* years before anyone noticed, and considering that the code-path in question has been the source of numerous regressions, I do *not* intend to request uplift of this patch to previous Firefox versions (assuming that it's even accepted).
Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1337429.
*Please note:* The rendering of the PDF file in issue 8061 first regressed in PR 7039, and then PR 7493 exacerbated the problem even further by causing an infinite loop.
In this particular case, when errors were encountered inside of the `Lexer.getObject` method *itself*, we didn't advance the stream position. This thus caused an inifinite loop in `parseCMap`, since the exact same character was then parsed over and over again.
Fixes 8061.
The `Driver._cleanup` method is removing all stylesheets between test runs, which causes "TypeError: styleElement.parentNode is null" console errors in `FontLoader.clear`.
As can also be seen during various tests, some of the changes I made in PR 7972 unfortunately causes console errors.
It seems that I didn't test this properly, since it *should* have been obvious to me that while tests are triggered using Node.js, the files in question are run within the *browser*.
My apologies for not testing this thoroughly, and for causing unnecessary churn in the code!
See http://eslint.org/docs/rules/brace-style.
Having the opening/closing braces on the same line can often make the code slightly more difficult to read, in particular for `if`/`else if` statements, compared to using new lines.
This patch also, for consistency with `mozilla-central`, enables the [`no-iterator`](http://eslint.org/docs/rules/no-iterator) rule. Note that this rule didn't require a single code change.
Other PDF viewers, e.g. Adobe Reader, seem to append `FileAttachment`s to their attachments views.
One obvious difference in PDF.js is that we cannot append all the annotations on document load, since that would require parsing *every* page. Despite that, it still seems like a good idea to add `FileAttachment`s, since it's thus possible to access all the various types of attachments from a single place.
*Note:* With the previous patch we display a notification when a `FileAttachment` is added to the sidebar, which thus makes appending the contents of these annotations to the sidebar slightly more visible/useful.
Please see http://eslint.org/docs/rules/no-unused-vars; note that this patch purposely uses the same rule options as in `mozilla-central`, such that it fixes part of issue 7957.
It wasn't, in my opinion, entirely straightforward to enable this rule compared to the already existing rules. In many cases a `var descriptiveName = ...` format was used (more or less) to document the code, and I choose to place the old variable name in a trailing comment to not lose that information.
I welcome feedback on these changes, since it wasn't always entirely easy to know what changes made the most sense in every situation.
Further adjust the heuristics used to detect OpenType font files with CFF data, to ensure that all Type0 fonts are handled the same way regardless of font Subtype (issue 7901)
Every other unit-test in `annotation_spec.js` is already only testing the annotation code. Hence it seems unnecessarily convoluted to make use of the API here, when we can (fairly) simply provide the necessary data explicitly as in all the other annotation unit-test.
We're currently making use of `uniquePrefix`/`idCounters` in multiple files, to create unique object id's, and adding a new occurrence of them requires some care to ensure that an object id isn't accidentally reused.
Furthermore, having to pass around multiple parameters as we currently do seem like something you want to avoid.
Instead, this patch adds a factory which means that there's only *one* thing that needs to be passed around. And since it's now only necessary to call a method in order to obtain a unique object id, the details are thus abstracted away at the call-sites which avoids accidental reuse of object id's.
To test that this works as expected a very simple `Page` unit-test is added, and the existing `Annotation layer` tests are also adjusted slightly.
This patch also removes the `UpdatePassword` message, in favour of using the `sendWithPromise` method of `MessageHandler`.
Furthermore, the patch also refactors the `BasePdfManager_updatePassword`/`BasePdfManager_passwordChanged` methods (in pdf_manager.js), and the `pdfManagerReady` function (in worker.js).
Changing this particular code makes me somewhat nervous about regressions, since PR 5770 necessitated the follow-up PR 6270.
However, the patch passes all tests added in those PRs (and obviously all other tests). Furthermore, I've manually checked all the issues/bugs referenced in PRs 5770 and 6270 without finding any issues.
**Please note:** This patch fixes *only* the font bug, not the SVG conversion, present on pages two and three of the PDF file in issue 7901.
Modern browsers support styling radio buttons and checkboxes with CSS.
This makes the implementation much easier, and the fallback for older
browsers is still decent.
I haven't got an example where the current code breaks, but given all the previous cases we've seen where PDF generators use indirect objects in Arrays it makes sense to fix this pro-actively.
I've modified the relevant unit-tests slightly, and they would *not* pass without the code changes in this patch.
*Note:* `Dict_getArray` only dereferences Array elements on the "top-level", to avoid recursion issues. Furthermore if you have to loop through the Array at the call-site anyway, then using `Dict_get` in combination with `XRef_fetchIfRef` is a tiny bit more efficient.
*Please note that most of the necessary code adjustments were made in PR 7890.*
ESLint has a number of advantageous properties, compared to JSHint. Among those are:
- The ability to find subtle bugs, thanks to more rules (e.g. PR 7881).
- Much more customizable in general, and many rules allow fine-tuned behaviour rather than the just the on/off rules in JSHint.
- Many more rules that can help developers avoid bugs, and a lot of rules that can be used to enforce a consistent coding style. The latter should be particularily useful for new contributors (and reduce the amount of stylistic review comments necessary).
- The ability to easily specify exactly what rules to use/not to use, as opposed to JSHint which has a default set. *Note:* in future JSHint version some of the rules we depend on will be removed, according to warnings in http://jshint.com/docs/options/, so we wouldn't be able to update without losing lint coverage.
- More easily disable one, or more, rules temporarily. In JSHint this requires using a numeric code, which isn't very user friendly, whereas in ESLint the rule name is simply used instead.
By default there's no rules enabled in ESLint, but there are some default rule sets available. However, to prevent linting failures if we update ESLint in the future, it seemed easier to just explicitly specify what rules we want.
Obviously this makes the ESLint config file somewhat bigger than the old JSHint config file, but given how rarely that one has been updated over the years I don't think that matters too much.
I've tried, to the best of my ability, to ensure that we enable the same rules for ESLint that we had for JSHint. Furthermore, I've also enabled a number of rules that seemed to make sense, both to catch possible errors *and* various style guide violations.
Despite the ESLint README claiming that it's slower that JSHint, https://github.com/eslint/eslint#how-does-eslint-performance-compare-to-jshint, locally this patch actually reduces the runtime for `gulp` lint (by approximately 20-25%).
A couple of stylistic rules that would have been nice to enable, but where our code currently differs to much to make it feasible:
- `comma-dangle`, controls trailing commas in Objects and Arrays (among others).
- `object-curly-spacing`, controls spacing inside of Objects.
- `spaced-comment`, used to enforce spaces after `//` and `/*. (This is made difficult by the fact that there's still some usage of the old preprocessor left.)
Rules that I indend to look into possibly enabling in follow-ups, if it seems to make sense: `no-else-return`, `no-lonely-if`, `brace-style` with the `allowSingleLine` parameter removed.
Useful links:
- http://eslint.org/docs/user-guide/configuring
- http://eslint.org/docs/rules/
I just realized that none of our current unit-tests cover this particular part of the Page Label parsing code, hence this patch adjusts an existing test PDF to include a "St" entry in the Page Label dictionary.
*This patch fixes something that I noticed while debugging https://bugzilla.mozilla.org/show_bug.cgi?id=1308536.*
The PDF file contains a font called "NuptialScript", which unfortunately is not embedded. Since that is a non-standard font we will not be able to render it entirely correct. However, by adding "NuptialScript" to the `getNonStdFontMap`, we can at least improve the rendering slightly by using an italic (serif) fallback font.
This patch adds support for non-embedded Arial Black fonts, that use a `Arial-Black...` format for the font names.
Also, this patch changes `canvas.js` such that we always render Arial Black fonts with the maximum weight, which actually improves a number of existing test-cases. This should thus explain the test "failures", which are clear improvements compared with e.g. Adobe Reader.
Fixes 7835.
By only allowing very specific type of `JavaScript` actions, and also utilizing the existing `URL` validation, this patch shouldn't pose too much risk.
Fixes one of the points in issue 3897 (with the PDF file taken from issue 3438).
Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=843699 (probably, since that bug doesn't contain a test-case).
It seems that certain bad PDF generators can create badly encoded "Prefix" entries for Page Labels, one example being http://ukjewishfilm.org/wp-content/uploads/2015/09/Jewish-Film-Festival-Programme-ONLINE.pdf.
Unfortunately I didn't come across such a PDF file while adding the API support for Page Labels, but with them now being used in the viewer I just found this issue. With this patch, we now display the Page Labels in the same way as Adobe Reader.