With `PDFFindController` instances no longer (directly) depending on
`BaseViewer` instances, we can pass a single `findController` when
initializing a viewer, similar to other components.
This removes the dependency on a `PDFViewer` instance from the find
controller, which makes it more similar to other components and makes it
easier to unit test with a mock link service.
Finally, we remove the search capabilities from the SVG example since it
doesn't work there because there is no separate text layer.
Now it follows the same pattern as e.g., the document properties
component, which allows us to have one instance of the find controller
and set a new document to search upon switching documents.
Moreover, this allows us to get rid of the dependency on `pdfViewer` in
order to fetch the text content for a page. This is working towards
getting rid of the `pdfViewer` dependency upon initializing the
component entirely in future commits.
Finally, we make the `reset` method private since it's not supposed to
be used from the outside anymore now that `setDocument` takes care of
this, similar to other components.
The purpose of this patch is to provide a better default behaviour when `JpegImage` is used to parse standalone JPEG images with CMYK colour spaces.
Since the issue that the patch concerns is somewhat of a special-case, the implementation utilizes the already existing decode support in an attempt to minimize the impact w.r.t. code size.
*Please note:* It's always possible for the user of `JpegImage` to control image inversion, and thus override the new behaviour, by simply passing a custom `decodeTransform` array upon initialization.
This should provide a better out-of-the-box experience when using PDF.js in a Node.js environment, since it's missing native support for both `@font-face` and `Image`.
Please note that this change *only* affects the default values, hence it's still possible for an API consumer to override those values when calling `getDocument`.
Also, prevents "ReferenceError: document is not defined" errors, when running the unit-tests in Node.js/Travis.
We need to pass `disableFontFace` and `nativeImageDecoderSupport`
because Node.js has no native support for `@font-face` and `Image`.
Doing so makes it possible to render e.g., the Tracemonkey paper, which
failed before. I made this PDF file the default because it's also the
default in other examples/demos and because it showcases the
possibilities better than the very simple hello world PDF file.
Building the library with `gulp dist-install` is easier and is already
recommended in the other examples.
Rather than having two different (but connected) options for the textLayer, I think that it makes sense to try and unify this. For example: currently if `disableTextLayer === true`, then the value of `enhanceTextSelection` is simply ignored.
Since PDF.js version `2.0` already won't be backwards compatible in lots of ways, I don't think that we need to worry about migrating existing preferences here.
This removes the `PDFJS.useOnlyCssZoom` dependency from the viewer components, but please note that as a *temporary* solution the default viewer still uses it.
Despite this patch removing the `disableWorker` option itself, please note that we'll still fallback to loading the worker file(s) on the main-thread when running in environments without proper Web Worker support.
Furthermore it's still possible, even with this patch, to force the use of fake workers by manually loading the necessary file using a `<script>` tag on the main-thread.[1]
That way, the functionality of the now removed `SINGLE_FILE` build target and the resulting `build/pdf.combined.js` file can still be achieved simply by adding e.g. `<script src="build/pdf.worker.js"></script>` to the HTML (obviously with the path adjusted as needed).
Finally note that the `disableWorker` option is a performance footgun, and unfortunately many existing third-party examples actually use it without providing any sort of warning/justification.
---
[1] This approach is used in the default viewer, since certain kind of debugging may be easier if the code is running directly on the main-thread.
These were removed in PR 9170, since they were unused in the browsers that we'll support in PDF.js version `2.0`.
However looking at the output of Travis, where a subset of the unit-tests are run using Node.js, there's warnings about `btoa` being undefined. This doesn't appear to cause any errors, which probably explains why we didn't notice this before (despite PR 9201).
The `DOMParser` is most likely overkill and may be less secure.
Moreover, it is not supported in Node.js environments.
This patch replaces the `DOMParser` with a simple XML parser. This
should be faster and gives us Node.js support for free. The simple XML
parser is a port of the one that existed in the examples folder with a
small regex fix to make the parsing work correctly.
The unit tests are extended for increased test coverage of the metadata
code. The new method `getAll` is provided so the example does not have
to access internal properties of the object anymore.
Implement a serialization "generator" for `DOMElement` in domutils.js
that yields the serialization of the SVG element. This method is used by
a newly added `ReadableSVGStream` class, which can be used like any
other readable stream in Node.js.
This reduces the memory requirements. Now, it is not needed to require
the serialization to fully fit in memory.
Note: The implementation of the serializer is a state machine in ES5
since the rest of the file is also in ES5. Its functionality is
equivalent to:
```
function* serializeSVGElement(elem) {
yield '<' + elem.nodeName;
if (elem.nodeName === 'svg:svg') {
yield ' xmlns:xlink="http://www.w3.org/1999/xlink"' +
' xmlns:svg="http://www.w3.org/2000/svg"';
}
for (let i in elem.attributes) {
yield ' ' + i + '="' + xmlEncode(elem.attributes[i]) + '"';
}
yield '>';
if (elem.nodeName === 'svg:tspan' || elem.nodeName === 'svg:style') {
yield xmlEncode(elem.textContent);
} else {
for (let childNode of elem.childNodes) {
yield* serializeSVGElement(childNode);
}
}
yield '</' + elem.nodeName + '>';
}
```
- Mark the test as async, and don't swallow exceptions.
- Fix the DOMElement polyfill to behave closer to the actual getAttributeNS
method, which excludes the namespace prefix.
Do not directly export to global. Instead, export all stubs in domstubs.js and
add a method setStubs to assign all exported stubs to a namespace. Then replace
the import domstubs with an explicit call to this setStubs method. Also added
unsetStubs for undoing the changes. This is done to allow unit testing of the
SVG backend without namespace pollution.
Wait for the completion of writing the generated SVG file before
processing the next page. This is to enable the garbage collector to
garbage-collect the (potentially large) SVG string before trying to
allocate memory again for the next page.
Note that since the PDF-to-SVG conversion is now sequential instead of
parallel, the time to generate all pages increases.
Test case:
node --max_old_space_size=200 examples/node/pdf2svg.js /tmp/FatalProcessOutOfMemory.pdf
Before this patch:
- Node.js crashes due to OOM after processing 20 pages.
After this patch:
- Node.js is able to convert all 203 PDFs to SVG without crashing.
Test case:
Using the PDF file from https://github.com/mozilla/pdf.js/issues/8534
node --max_old_space_size=200 examples/node/pdf2svg.js /tmp/FatalProcessOutOfMemory.pdf
Before this patch:
Node.js crashes due to OOM after processing 10 pages.
After this patch:
Node.js crashes due to OOM after processing 19 pages.
It doesn't really make sense to attempt to utilize the `NativeImageDecoder` in Node, since there's no native image support available, hence building on PR 8035 we can easily disable it in the example.
Fixes 7901.
This patch lets the AcroForms example make use of the built-in interactive
forms functionality in PDF.js. This makes the example:
- much easier to understand;
- more feature-complete;
- in sync with the core when new functionality is added;
- similar to the other examples in terms of structure.
The original code is difficult to read and, more importantly, performs
actions that are not described in the specification. It replaces empty
names with a backtick and an index, but this behavior is not described
in the specification. While the specification is not entirely clear
about what should happen in this case, it does specify that the `T`
field is optional and that multiple field dictionaries may have the same
fully qualified name, so to achieve this it makes the most sense to
ignore missing `T` fields during construction of the field name. This is
the most specification-compliant solution and, judging by opened issue #6623, also the required and expected behavior.
This patch:
- resolves a warning in the console about missing character encoding;
- makes the viewer use the same background color and PDF file as the
regular viewer;
- simplifies the example to bring it more in line with the other
examples.
Since the `mobile-viewer` example is based on the old FirefoxOS/B2G PDF viewer, it didn't need to have the same kind of `open/close` methods as the default viewer.
However, now that it has been re-purposed as a simple `mobile-viewer` example, it seems like a good idea to ensure that it has proper asynchronous `open/close` methods.
Fixes 7571.
Previously the commands were not properly parsed as such by GitHub
because they need to be indented with four spaces.
Furthermore, address some minor textual nits.
Re: issue 7017.
This should prevent the error, but does not fix the broken rendering.
The rendering issues are caused by `svg.js` not supporting the various text rendering modes, in this case specifically "invisible" (as indicated in the console `Warning: Unimplemented method setTextRenderingMode`).
Compared to the canvas case, where we just ignore invisible text, a smarter solution is probably required for the SVG case. Since just ignoring invisible text in `svg.js` would mean that text-selection isn't possible.
This patch makes the naming consistent with the `TextLayerBuilder`, and also the new `AnnotationLayer`, and should thus help reduce possible confusion when working with the code.
Please note that the files were renamed using `git mv`, in order to preserve blame.
This is intended as a temporary solution, in order to get the "simpleviewer" example to work, until we've re-factored `scrollIntoView` to work in both the standard and components-based viewers.
"text/javascript" is not a correct MIME type (the correct one is
"application/javascript") but it's not even needed; all browsers default
to the correct type and treat it as executable JS when type is ommited.
Since not all browsers recognize the "application/javascript" MIME type
the only way to both stay compliant and to support all popular browsers
is to omit the type. It's also shorter this way.