The `DOMParser` is most likely overkill and may be less secure.
Moreover, it is not supported in Node.js environments.
This patch replaces the `DOMParser` with a simple XML parser. This
should be faster and gives us Node.js support for free. The simple XML
parser is a port of the one that existed in the examples folder with a
small regex fix to make the parsing work correctly.
The unit tests are extended for increased test coverage of the metadata
code. The new method `getAll` is provided so the example does not have
to access internal properties of the object anymore.
Implement a serialization "generator" for `DOMElement` in domutils.js
that yields the serialization of the SVG element. This method is used by
a newly added `ReadableSVGStream` class, which can be used like any
other readable stream in Node.js.
This reduces the memory requirements. Now, it is not needed to require
the serialization to fully fit in memory.
Note: The implementation of the serializer is a state machine in ES5
since the rest of the file is also in ES5. Its functionality is
equivalent to:
```
function* serializeSVGElement(elem) {
yield '<' + elem.nodeName;
if (elem.nodeName === 'svg:svg') {
yield ' xmlns:xlink="http://www.w3.org/1999/xlink"' +
' xmlns:svg="http://www.w3.org/2000/svg"';
}
for (let i in elem.attributes) {
yield ' ' + i + '="' + xmlEncode(elem.attributes[i]) + '"';
}
yield '>';
if (elem.nodeName === 'svg:tspan' || elem.nodeName === 'svg:style') {
yield xmlEncode(elem.textContent);
} else {
for (let childNode of elem.childNodes) {
yield* serializeSVGElement(childNode);
}
}
yield '</' + elem.nodeName + '>';
}
```
- Mark the test as async, and don't swallow exceptions.
- Fix the DOMElement polyfill to behave closer to the actual getAttributeNS
method, which excludes the namespace prefix.
Do not directly export to global. Instead, export all stubs in domstubs.js and
add a method setStubs to assign all exported stubs to a namespace. Then replace
the import domstubs with an explicit call to this setStubs method. Also added
unsetStubs for undoing the changes. This is done to allow unit testing of the
SVG backend without namespace pollution.
Wait for the completion of writing the generated SVG file before
processing the next page. This is to enable the garbage collector to
garbage-collect the (potentially large) SVG string before trying to
allocate memory again for the next page.
Note that since the PDF-to-SVG conversion is now sequential instead of
parallel, the time to generate all pages increases.
Test case:
node --max_old_space_size=200 examples/node/pdf2svg.js /tmp/FatalProcessOutOfMemory.pdf
Before this patch:
- Node.js crashes due to OOM after processing 20 pages.
After this patch:
- Node.js is able to convert all 203 PDFs to SVG without crashing.
Test case:
Using the PDF file from https://github.com/mozilla/pdf.js/issues/8534
node --max_old_space_size=200 examples/node/pdf2svg.js /tmp/FatalProcessOutOfMemory.pdf
Before this patch:
Node.js crashes due to OOM after processing 10 pages.
After this patch:
Node.js crashes due to OOM after processing 19 pages.
It doesn't really make sense to attempt to utilize the `NativeImageDecoder` in Node, since there's no native image support available, hence building on PR 8035 we can easily disable it in the example.
Fixes 7901.