This commit is the first step towards implementing parsing for the
appearance streams of annotations.
Co-authored-by: Jonas Jenwald <jonas.jenwald@gmail.com>
Co-authored-by: Tim van der Meij <timvandermeij@gmail.com>
The font tests use Jasmine too, so while they are technically unit
tests, it's a bit confusing to see `Started unit tests` when the font
tests are run on the bots.
This should really have been included in PR 9868, since it will help ensure that the `URL` constructor is correctly imported/exported by `src/shared/util.js`.
There was a (somewhat) recent question on IRC about accessing the linearization status of a PDF document, and this patch contains a simple way to expose that through already existing API methods.
Please note that during setup/parsing in `PDFDocument` the linearization data is already being fetched and parsed, provided of course that it exists. Hence this patch will *not* cause any additional data to be loaded.
With the new XML parser, see PR 9573, the referenced PDF file now causes `getMetadata` to fail when incomplete XML tags are encountered. This provides a simple, and hopefully generally useful, work-around that may also help prevent future bugs.
(Without being able to reproduce nor even understand the other (non XML) errors mentioned in issue 8884, I'd say that this patch is enough to close that one as fixed.)
Currently if `RenderTask.cancel` is called *immediately* after rendering was started, then by the time that `InternalRenderTask.initializeGraphics` is called rendering will already have been cancelled.
However, we're still inserting the canvas into the `canvasInRendering` map, thus breaking any future attempts at re-rendering using the same canvas. Considering that `InternalRenderTask.cancel` always removes the canvas from the map, I cannot imagine that we'd ever want to re-add it *after* rendering was cancelled (it was likely just a simple oversight in PR 8519).
Fixes 9456.
This wasn't included in PR 9245, since all the API options were still global at that time.
Writing the unit-tests also uncovered an issue with `getOperatorList` not starting the "Page Request" timer.
Incomplete path operators, in particular, can result in fairly chaotic rendering artifacts, as can be observed on page four of the referenced PDF file.
The initial (naive) solution that was attempted, was to simply throw a `FormatError` as soon as any invalid (i.e. too short) operator was found and rely on the existing `ignoreErrors` code-paths. However, doing so would have caused regressions in some files; see the existing `issue2391-1` test-case, which was promoted to an `eq` test to help prevent future bugs.
Hence this patch, which adds special handling for invalid path operators since those may cause quite bad rendering artifacts.
You could, in all fairness, argue that the patch is a handwavy solution and I wouldn't object. However, given that this only concerns *corrupt* PDF files, the way that PDF viewers (PDF.js included) try to gracefully deal with those could probably be described as a best-effort solution anyway.
This patch also adjusts the existing `warn`/`info` messages to print the command name according to the PDF specification, rather than an internal PDF.js enumeration value. The former should be much more useful for debugging purposes.
Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1443140.
With the current code line-breaks are accepted not just after an operator, but after a decimal point as well. When looking at this again, the latter case seems prone to cause false positives and might also interfere with subsequent patches.
Hence this is code is adjusted to actually do what the original commit message says, and nothing more.
The built-in image decoders are already using `Uint8ClampedArray` when returning data, and this patch simply extends that to the rest of the image/colorspace code.
As far as I can tell, the only reason for using manual clamping/rounding in the first place was because TypedArrays used to be polyfilled (using regular arrays). And trying to polyfill the native clamping/rounding would probably have been had too much overhead, but given that TypedArray support is required in PDF.js version `2.0` that's no longer a concern.
*Please note:* Because of different rounding behaviour, basically `Math.round` in `Uint8ClampedArray` respectively `Math.floor` in the old code, there will be very slight movement in quite a few existing test-cases. However, the changes should be imperceivable to the naked eye, given that the absolute difference is *at most* `1` for each RGB component when comparing `master` and this patch (see also the updated expectation values in the unit-tests).
The built-in image decoders are already returning data as `Uint8ClampedArray`, and subsequently the JPEG/JBIG2/JPX streams are as well. However, for general streams we obviously don't want to force the use of `Uint8ClampedArray` unless an "Image" is actually being decoded.
Hence this patch, which adds a parameter that allows the caller of the `getBytes`/`peekBytes` methods to force a `Uint8ClampedArray` (rather than a `Uint8Array`) to be returned.
The various classes have `this._errored` and `this._reason` properties, where the first one is a boolean indicating if an error was encountered and the second one contains the actual `Error` (or `null` initially).
In practice this means that errors are basically tracked *twice*, rather than just once. This kind of double-bookkeeping is generally a bad idea, since it's quite easy for the properties to (accidentally) get into an inconsistent state whenever the relevant code is modified.
Rather than using a separate boolean, we can just as well check the "error" property directly (since `null` is falsy).
---
Somewhat unrelated to this patch, but `src/display/node_stream.js` is currently *not* handling errors in a consistent or even correct way; compared with `src/display/network.js` and `src/display/fetch_stream.js`.
Obviously using the `createResponseStatusError` utility function, from `src/display/network_utils.js`, might not make much sense in a Node.js environment. However at the *very* least it seems that `MissingPDFException`, or `UnknownErrorException` when one cannot tell that the PDF file is "missing", should be manually thrown.
As is, the API (i.e. `getDocument`) is not returning the *expected* errors when loading fails in Node.js environments (as evident from the `pending` API unit-test).
There's no good reason, as far as I can tell, to duplicate the functionality of the `LoopbackPort` in the unit-tests. The only difference between the implementations is that `LoopbackPort` mimics the (native) structured cloning, however that shouldn't matter here since the tests are only sending "simple" data (strings respectively arrays with numbers).
Furthermore the patch also changes `LoopbackPort` to default to using "structured cloning" and deferred invocation of the listeners, since native typed array support is now a requirement for using the PDF.js library.
The `MessageHandler` itself, and its assorted helper functions, are currently the single largest[1] piece of code in the `src/shared/util.js` file. By moving this code into its own file, `src/shared/util.js` thus becomes smaller and more manageable.
Compared to running the unit-tests in "regular" browsers, where any console output won't get mixed up with test output, in Node.js/Travis the test output looks quite noisy.
By ignoring `info`/`warn` calls, when running unit-tests in Node.js/Travis, the test output is a lot smaller not to mention that any *actual* failures are more easily spotted.
A couple of basic unit-tests are added, and a manual `isLandscape` check (in `web/base_viewer.js`) is also converted to use the helper function instead.
Jasmine had a major version bump and required a few minor changes in our
booting code. Most notably, using `pending` in a `describe` block is no
longer supported, so we can only return early there. On the positive
side, the unit tests now run in a random order by default, which
eliminates any dependencies between unit tests.
Note that upgrading to Webpack 4 is out of scope for this patch since
the bots cannot work well with the newly generated bundles (both
browsers on both bots do not react within 120 seconds). Webpack 4 is not
faster for us than Webpack 3, so for now there is no need to upgrade.
The `getPageSizeInches` method was implemented on `PDFDocumentProxy`, which seems conceptually wrong since the size property isn't global to the document but rather specific to each page. Hence the method is moved into `PDFPageProxy`, as `get pageSizeInches` instead to address this.
Despite the fact that new API functionality was implemented, no unit-tests were added. To prevent issues later on, we should *always* ensure that new functionality has at least some test-coverage; something that this patch also takes care of.
The new `PDFDocumentProperties._parsePageSize` method seemed unnecessary convoluted. Furthermore, in the "no data provided"-case it even returned incorrect data (an array, rather than the expected object).
Finally, the fallback strings didn't actually agree with the `en-US` locale. This inconsistency doesn't look too great, and it's thus addressed here as well.
This function combines the logic of two separate methods into one.
The loop limit is also a good thing to have for the calls in
`src/core/annotation.js`.
Moreover, since this is important functionality, a set of unit tests and
documentation is added.
With PDF.js version `2.0` we'll only support browsers with built-in `TypedArray` functionality, hence there doesn't seem to be any good reason not to implement this now.
Fixes 4888.
This removes the `PDFJS.externalLinkTarget`/`PDFJS.externalLinkRel` dependency from the viewer components, but please note that as a *temporary* solution the default viewer still uses it.
Despite this patch removing the `disableWorker` option itself, please note that we'll still fallback to loading the worker file(s) on the main-thread when running in environments without proper Web Worker support.
Furthermore it's still possible, even with this patch, to force the use of fake workers by manually loading the necessary file using a `<script>` tag on the main-thread.[1]
That way, the functionality of the now removed `SINGLE_FILE` build target and the resulting `build/pdf.combined.js` file can still be achieved simply by adding e.g. `<script src="build/pdf.worker.js"></script>` to the HTML (obviously with the path adjusted as needed).
Finally note that the `disableWorker` option is a performance footgun, and unfortunately many existing third-party examples actually use it without providing any sort of warning/justification.
---
[1] This approach is used in the default viewer, since certain kind of debugging may be easier if the code is running directly on the main-thread.
This method returns the currently used `workerSrc`, which thus allows obtaining the fallback `workerSrc` value (e.g. when the option wasn't set by the user).
Remove "returns null when content disposition is form-data".
The name of the test is already misleading: It suggests that
the return value is null if the Content-Disposition starts with
"form-data". This is not the case, anything with the "filename"
parameter is accepted.
So, to correct this, one would have to rephrase the test description to
"returns null when content disposition has no filename".
But this is already tested by the test called
"gets the filename from the response header".
So, remove the test.
This patch updates the `IPDFStreamReader` interface and ensures that the interface/implementation of `network.js`, `fetch_stream.js`, `node_stream.js`, and `transport_stream.js` all match properly.
The unit-tests are also adjusted, to more closely replicate the actual behaviour of the various actual `IPDFStreamReader` implementations.
Finally, this patch adjusts the use of the Content-Disposition filename when setting the title in the viewer, and adds `PDFDocumentProperties` support as well.
The `fetch` API is only supported for http(s), even in Chrome extensions.
Because of this limitation, we should use the XMLHttpRequest API when the
requested URL is not a http(s) URL.
Fixes#9361
These were removed in PR 9170, since they were unused in the browsers that we'll support in PDF.js version `2.0`.
However looking at the output of Travis, where a subset of the unit-tests are run using Node.js, there's warnings about `btoa` being undefined. This doesn't appear to cause any errors, which probably explains why we didn't notice this before (despite PR 9201).
Since multiple empty lines is virtually unused in the code-base, and the few cases that do exist look like "typos", let's enforce greater consistency here; please see https://eslint.org/docs/rules/no-multiple-empty-lines.
Initially I just implemented the unit tests, but quickly found that they
were failing my expectation of having a size of 256 items. Some of them
did contain 256 items and some did not. I looked up various resources
and figured that they indeed all need to have 256 items. One of the good
resources is https://github.com/davidben/poppler/blob/master/poppler/FontEncodingTables.cc
Aside from some missing `notdef` (empty string) entries at the end of
the arrays, which I assume causes issues since it may cause
out-of-bounds array access which in JavaScript gives `undefined`, there
was a `notdef` entry missing in the `MacExpertEncoding`, causing the
entries after that to be shifted. This fix for this is similar to the
one in #8589.
The unit tests verify that, for known encoding names, the return value
is not only an array, but that it is also of the right length and
contains only strings.
Please note that while this could be considered a regression in user-facing behaviour, I'm not convinced that it's really a regression as such since prior to PR 8912 the Metadata would fail to parse (with an XML error) and thus be ignored when setting the viewer title.
With the refactored Metadata parsing we're now able to parse this, which uncovered issues with a subset of broken Ghostscript Metadata that uses HTML character names.
Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1424938
It is quite confusing that the custom function is called `log2` while it
actually returns the ceiling value and handles zero and negative values
differently than the native function.
To resolve this, we add a comment that explains these differences and
make the function use the native `Math` functions internally instead of
using our own custom logic. To verify that the function does what we
expect, we add unit tests.
All browsers except for IE support `Math.log2` for quite a long time
already (see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Math/log2).
For IE, we use the core-js polyfill.
According to the microbenchmark at https://jsperf.com/log2-pdfjs/1,
using the native functions should also be faster, in my testing almost
six times as fast.
*This patch is the result of me starting to look into moving parameters from `PDFJS` into `getDocument` and other API methods.*
When familiarizing myself with the code, the signatures of the various network streams seemed to be unnecessarily cumbersome since `disableRange` is currently handled separately from other parameters.
I'm assuming that the explanation for this is probably "for historical reasons", as is often the case. Hence I'd like to clean this up *before* we start the larger, and more invasive, `PDFJS` parameter re-factoring.
Nothing uses this option anymore, so setting it is a no-op now. We can
safely remove it.
Use `SKIP_BABEL` (instead of `PDFJS_NEXT`) now if you want to skip Babel
translation for a build.
This patch makes use of the existing `ignoreErrors` property in `src/core/evaluator.js`, see PRs 8240 and 8441, thus allowing us to attempt to recovery as much as possible of a page even when it contains broken XObjects.
Fixes 8702.
Fixes 8704.
*Follow-up to PR 8909.*
This requires us to pass around `pdfFunctionFactory` to quite a lot of existing code, however I don't see another way of handling this while still guaranteeing that we can access `PDFFunction` as freely as in the old code.
Please note that the patch passes all tests locally (unit, font, reference), and I *very* much hope that we have sufficient test-coverage for the code in question to catch any typos/mistakes in the re-factoring.
The `DOMParser` is most likely overkill and may be less secure.
Moreover, it is not supported in Node.js environments.
This patch replaces the `DOMParser` with a simple XML parser. This
should be faster and gives us Node.js support for free. The simple XML
parser is a port of the one that existed in the examples folder with a
small regex fix to make the parsing work correctly.
The unit tests are extended for increased test coverage of the metadata
code. The new method `getAll` is provided so the example does not have
to access internal properties of the object anymore.
This patch provides a new unit tested factory for creating SVG
containers and elements. This code is duplicated twice in the
codebase, but with upcoming changes this would need to be duplicated
even more. Moreover, consolidating this code in one factory allows
us to replace it easily for e.g., supporting Node.js. Therefore, move
this to a central place and update/ES6-ify the related code.
Finally, we replace `setAttributeNS` with `setAttribute` because no
namespace is provided.
This changes both `PDFViewer` and `PDFThumbnailViewer` to return early in the `pagesRotation` setters if the rotation doesn't change.
It also fixes an existing issue, in `PDFViewer`, that would cause errors if the rotation changes *before* the scale has been set to a non-default value.
Finally, in preparation for subsequent patches, it also refactors the rotation code in `web/app.js` to update the thumbnails and trigger rendering with the new `rotationchanging` event.
According to the specification, see http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#page=377, a `Dest` entry in an outline item should *not* contain a dictionary.
Unsurprisingly there's PDF generators that completely ignore this, treating is an `A` entry instead.
The patch also adds a little bit more validation code in `Catalog.parseDestDictionary`.
When running browser tests, e.g. via `gulp unittest`, the test files are not
processed by babel, and neither by the "unittestcli" gulp target.
This commit copies the babelPluginReplaceNonWebPackRequire plugin from the
unittestcli target to the SystemJS config so that `__non_webpack_require__` is
replaced with `require` for all build targets, and adds a unit test to ensure
that this indeed works as expected.
Added unit-tests for DeviceGray, DeviceRGB and DeviceCMYK
Added unit-tests for CalGray
Added unit-tests for CalRGB
Removed redundant code
Added unit-tests for LabCS
Added unit-tests for IndexedCS
Update comment
Change lookup to Uint8Array as mentioned in pdf specs(these tests will pass after PR #8666 is merged).
Added unit-tests for AlternateCS
Resolved code-style issues
Fixed code-style issues
Addressed issues pointed out in https://github.com/mozilla/pdf.js/pull/8611#pullrequestreview-52865469