The `DOMParser` is most likely overkill and may be less secure.
Moreover, it is not supported in Node.js environments.
This patch replaces the `DOMParser` with a simple XML parser. This
should be faster and gives us Node.js support for free. The simple XML
parser is a port of the one that existed in the examples folder with a
small regex fix to make the parsing work correctly.
The unit tests are extended for increased test coverage of the metadata
code. The new method `getAll` is provided so the example does not have
to access internal properties of the object anymore.
Implement a serialization "generator" for `DOMElement` in domutils.js
that yields the serialization of the SVG element. This method is used by
a newly added `ReadableSVGStream` class, which can be used like any
other readable stream in Node.js.
This reduces the memory requirements. Now, it is not needed to require
the serialization to fully fit in memory.
Note: The implementation of the serializer is a state machine in ES5
since the rest of the file is also in ES5. Its functionality is
equivalent to:
```
function* serializeSVGElement(elem) {
yield '<' + elem.nodeName;
if (elem.nodeName === 'svg:svg') {
yield ' xmlns:xlink="http://www.w3.org/1999/xlink"' +
' xmlns:svg="http://www.w3.org/2000/svg"';
}
for (let i in elem.attributes) {
yield ' ' + i + '="' + xmlEncode(elem.attributes[i]) + '"';
}
yield '>';
if (elem.nodeName === 'svg:tspan' || elem.nodeName === 'svg:style') {
yield xmlEncode(elem.textContent);
} else {
for (let childNode of elem.childNodes) {
yield* serializeSVGElement(childNode);
}
}
yield '</' + elem.nodeName + '>';
}
```
- Mark the test as async, and don't swallow exceptions.
- Fix the DOMElement polyfill to behave closer to the actual getAttributeNS
method, which excludes the namespace prefix.
Do not directly export to global. Instead, export all stubs in domstubs.js and
add a method setStubs to assign all exported stubs to a namespace. Then replace
the import domstubs with an explicit call to this setStubs method. Also added
unsetStubs for undoing the changes. This is done to allow unit testing of the
SVG backend without namespace pollution.
Wait for the completion of writing the generated SVG file before
processing the next page. This is to enable the garbage collector to
garbage-collect the (potentially large) SVG string before trying to
allocate memory again for the next page.
Note that since the PDF-to-SVG conversion is now sequential instead of
parallel, the time to generate all pages increases.
Test case:
node --max_old_space_size=200 examples/node/pdf2svg.js /tmp/FatalProcessOutOfMemory.pdf
Before this patch:
- Node.js crashes due to OOM after processing 20 pages.
After this patch:
- Node.js is able to convert all 203 PDFs to SVG without crashing.
Test case:
Using the PDF file from https://github.com/mozilla/pdf.js/issues/8534
node --max_old_space_size=200 examples/node/pdf2svg.js /tmp/FatalProcessOutOfMemory.pdf
Before this patch:
Node.js crashes due to OOM after processing 10 pages.
After this patch:
Node.js crashes due to OOM after processing 19 pages.
It doesn't really make sense to attempt to utilize the `NativeImageDecoder` in Node, since there's no native image support available, hence building on PR 8035 we can easily disable it in the example.
Fixes 7901.
This patch lets the AcroForms example make use of the built-in interactive
forms functionality in PDF.js. This makes the example:
- much easier to understand;
- more feature-complete;
- in sync with the core when new functionality is added;
- similar to the other examples in terms of structure.
The original code is difficult to read and, more importantly, performs
actions that are not described in the specification. It replaces empty
names with a backtick and an index, but this behavior is not described
in the specification. While the specification is not entirely clear
about what should happen in this case, it does specify that the `T`
field is optional and that multiple field dictionaries may have the same
fully qualified name, so to achieve this it makes the most sense to
ignore missing `T` fields during construction of the field name. This is
the most specification-compliant solution and, judging by opened issue #6623, also the required and expected behavior.
This patch:
- resolves a warning in the console about missing character encoding;
- makes the viewer use the same background color and PDF file as the
regular viewer;
- simplifies the example to bring it more in line with the other
examples.
Since the `mobile-viewer` example is based on the old FirefoxOS/B2G PDF viewer, it didn't need to have the same kind of `open/close` methods as the default viewer.
However, now that it has been re-purposed as a simple `mobile-viewer` example, it seems like a good idea to ensure that it has proper asynchronous `open/close` methods.
Fixes 7571.
Previously the commands were not properly parsed as such by GitHub
because they need to be indented with four spaces.
Furthermore, address some minor textual nits.
Re: issue 7017.
This should prevent the error, but does not fix the broken rendering.
The rendering issues are caused by `svg.js` not supporting the various text rendering modes, in this case specifically "invisible" (as indicated in the console `Warning: Unimplemented method setTextRenderingMode`).
Compared to the canvas case, where we just ignore invisible text, a smarter solution is probably required for the SVG case. Since just ignoring invisible text in `svg.js` would mean that text-selection isn't possible.
This patch makes the naming consistent with the `TextLayerBuilder`, and also the new `AnnotationLayer`, and should thus help reduce possible confusion when working with the code.
Please note that the files were renamed using `git mv`, in order to preserve blame.