*After browsing through (a version of) the JPEG specification, see https://www.w3.org/Graphics/JPEG/itu-t81.pdf, I hope that this patch makes sense.*
Note that while issue 7828 became a problem after PR 7661, it isn't really a regression from than PR. The explanation is rather that we're now relying on `core/jpg.js` instead of the Native Image decoder in more situations than before, which thus exposed an *existing* issue in our JPEG decoder.
Another factor also seems to be that in many JPEG images, the DRI (Define Restart Interval) marker isn't present, in which case this bug won't manifest either.
According to https://www.w3.org/Graphics/JPEG/itu-t81.pdf#page=89 (at the bottom of the page):
"NOTE – The final restart interval may be smaller than the size specified by the DRI marker segment, as it includes only the number of MCUs remaining in the scan."
Furthermore, according to https://www.w3.org/Graphics/JPEG/itu-t81.pdf#page=39 (in the middle of the page):
"[...] If restart is enabled and the restart interval is defined to be Ri, each entropy-coded segment except the last one shall contain Ri MCUs. The last one shall contain whatever number of MCUs completes the scan."
Based on the above, it thus seem to me that we should simply ensure that we're not attempting to continue to parse Scan data once we've found all MCUs (Minimum Coded Unit) of the image.
Fixes 7828.
I happened to notice that some inequalities had the wrong order, and was surprised since I thought that the `yoda` rule should have caught that.
However, reading http://eslint.org/docs/rules/yoda#options a bit more closely than previously, it's quite obvious that the `onlyEquality` option does *exactly* what its name suggests. Hence I think that it makes sense to adjust the options such that only ranges are allowed instead.
*This is something that I noticed while working on PR 8126, which is (more) fallout from PR 6065.*
In general, it's actually *not* correct to return `Dict.empty` as the default value for non-existent properties. Please note that a prior PR, see https://github.com/mozilla/pdf.js/pull/5957#issuecomment-103112698, asked for that behaviour but I don't think that's right.
Obviously for properties that are (or should) be `Dict`s it makes sense, however certain properties can be e.g. Strings or Arrays instead. In the latter case, returning `Dict.empty` is just plain wrong, and it's quite fascinating that this hasn't caused any errors in practice. (The existing validation in the various getters has actually saved us here.)
Also, when looking at this code again, it seemed unnecessary to duplicate the `MAX_LOOP_COUNT` check since we could just return immediately instead.
When using content security headers to restrict connections to the same origin,
you may not make connections to `example.com`. This feature detection also
works with a request to the current location.
It appears that I accidentally broke this in PR 6065, sorry about that!
The issue in this particular PDF file is that there's `/Rotate` entries on different levels of the `/Pages` tree. We're supposed to use the `/Rotate` entry in the `/Page` dict (which is `0`), but because of an incorrect condition we instead ended up with the one from the `/Pages` dict (which is `180`).
Fixes 8125.
Ideally, the `Annotation` class should not have anything to do with the
page's operator list. How annotations are added to the page's operator
list is logic that belongs in `src/core/document.js` instead where the
operator list is constructed.
Moreover, some comments have been added to clarify the intent of the
code.
Even though the PDF specification does not state that `Opt` fields are
inheritable, in practice there are PDF generators that let annotations
inherit the options from a parent.
This fixes something that I noticed while working with the code in `Catalog.getPageDict` when debugging issue 8088.
Note that while I don't have an example where this patch really matters, given that e.g. `PartialEvaluator.hasBlendModes` depends on the `objId` to avoid cyclic references this patch could potentially help for some PDF files.
As discussed on IRC, we need to check all nodes at the *bottom* of the tree to ensure that we find the correct `Page` dict.
Furthermore, this patch also gets rid of the caching present in a previous version, since it's not clear if that really helps.
Note that this patch purposely adds an `eq` test, using a reduced test-case, so that we can be sure that the algorithm actually finds the correct `Page` dict for each `pageIndex`.
Fixes 8088.
[api-minor] Refactor fetching of built-in CMaps to utilize a factory on the `display` side instead, to allow users of the API to provide a custom CMap loading factory (e.g. for use with Node.js)
Currently the built-in CMap files are loaded in `src/core/cmap.js` using `XMLHttpRequest` directly. For some environments that might be a problem, hence this patch refactors that to instead use a factory to load built-in CMaps on the main thread and message the data to the worker thread.
This is inspired by other recent work, e.g. the addition of the `CanvasFactory`, and to a large extent on the IRC discussion starting at http://logs.glob.uno/?c=mozilla%23pdfjs&s=12+Oct+2016&e=12+Oct+2016#c53010.
Previously, we had a function called `getDefaultAppearance`. This name,
however, is misleading as the method gets the normal appearance (in the
`N` entry) and not the default appearance (in the `DA` entry). Moreover,
it was not entirely clear how it works just from reading the code. It
primarily lacks comments and explicit error case handling.
This patch improves the situation by fixing the issues mentioned above
and making this function a proper method of the `Annotation` class, just
like e.g., `setColor` and `setBorderStyle`.
This patch basically reverts one aspect of TrueType (3, 1) cmap parsing to the state prior to PR 4259. After that PR, a number of regressions occurred in this particular code-path, which necessitated a number of follow-ups such as PRs 5703, 5743, and 6425.
The empirical data suggests, at least to me, that we should always prefer a (3, 1) cmap for TrueType fonts when they have an encoding, regardless of the Symbolic font flag.
Obviously this patch passes all unit/font/reference tests locally, and I made sure that all the PRs mentioned above landed with test-cases included.
However, in my opinion, there's still a very real possibility that this patch could potentially cause new regressions.
Given that the PDF file in bug 1337429 has been broken for almost *three* years before anyone noticed, and considering that the code-path in question has been the source of numerous regressions, I do *not* intend to request uplift of this patch to previous Firefox versions (assuming that it's even accepted).
Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1337429.
*Please note:* The rendering of the PDF file in issue 8061 first regressed in PR 7039, and then PR 7493 exacerbated the problem even further by causing an infinite loop.
In this particular case, when errors were encountered inside of the `Lexer.getObject` method *itself*, we didn't advance the stream position. This thus caused an inifinite loop in `parseCMap`, since the exact same character was then parsed over and over again.
Fixes 8061.
Note that I initially tried to add this as a parameter to the `PDFPageProxy.render` method, such that it could be passed to `PartialEvaluator.getOperatorList`.
However, given all the different code-paths that call `getOperatorList` (there's a bunch only in `annotation.js`), this seemed to very quickly become unwieldy and thus difficult to maintain compared to simply using the existing `evaluatorOptions`.
See http://eslint.org/docs/rules/brace-style.
Having the opening/closing braces on the same line can often make the code slightly more difficult to read, in particular for `if`/`else if` statements, compared to using new lines.
This patch also, for consistency with `mozilla-central`, enables the [`no-iterator`](http://eslint.org/docs/rules/no-iterator) rule. Note that this rule didn't require a single code change.
This property was added all the way back in PR 542, but hasn't actually been relied upon ever since PR 692.
Note that there's a `isStream()` utility function which replaced the property years ago, hence the `isStream` property is now dead code.
Please see http://eslint.org/docs/rules/no-unused-vars; note that this patch purposely uses the same rule options as in `mozilla-central`, such that it fixes part of issue 7957.
It wasn't, in my opinion, entirely straightforward to enable this rule compared to the already existing rules. In many cases a `var descriptiveName = ...` format was used (more or less) to document the code, and I choose to place the old variable name in a trailing comment to not lose that information.
I welcome feedback on these changes, since it wasn't always entirely easy to know what changes made the most sense in every situation.
Given the nature of `EOF` and `isEOF`, it seems to me that they really ought to be placed in `core/primitives.js` instead.
In general, it doesn't seem great to have to depend on the entire `core/parser.js` file for such simple primitives/helper functions.
In particular, while `core/ps_parser.js` is completely separate from `core/parser.js` with regards to its function, it still depends on the latter for just *one* primitive.
Note that compared to e.g. PR 7389, this will not reduce the number of dependencies for `core/ps_parser`, however the new dependency IMHO makes more sense.
PR 7322 added the `PdfJsNetwork.jsm` file, instead of the general `src/core/network.js` file for the Firefox addon. However, `make.js` wasn't updated to actually stop including the now obsolete network file.
Please see http://eslint.org/docs/rules/spaced-comment.
Note that the exceptions added for `line` comments are intended to still allow use of the old preprocessor without linting errors.
Also, I took the opportunity to improve the grammar slightly (w.r.t. capitalization and punctuation) for comments touched in the patch.
Further adjust the heuristics used to detect OpenType font files with CFF data, to ensure that all Type0 fonts are handled the same way regardless of font Subtype (issue 7901)
We're currently making use of `uniquePrefix`/`idCounters` in multiple files, to create unique object id's, and adding a new occurrence of them requires some care to ensure that an object id isn't accidentally reused.
Furthermore, having to pass around multiple parameters as we currently do seem like something you want to avoid.
Instead, this patch adds a factory which means that there's only *one* thing that needs to be passed around. And since it's now only necessary to call a method in order to obtain a unique object id, the details are thus abstracted away at the call-sites which avoids accidental reuse of object id's.
To test that this works as expected a very simple `Page` unit-test is added, and the existing `Annotation layer` tests are also adjusted slightly.
Using `else` after `return` is not necessary, and can often lead to unnecessarily cluttered code. By using the `no-else-return` rule in ESLint we can avoid this pattern, see http://eslint.org/docs/rules/no-else-return.