Fix handling of /Filter-entries, since the current implementation could potentially corrupt the data if there's multiple filters present.
Please note that filters are applied *sequentially* during decoding, starting from the first one in the Array, hence the first Array-entry needs to be /FlateDecode in order for things to actually work correctly.
To prevent a future bug, if we want to save more "complex" data such as images, also ensure that we include any existing /DecodeParms-entries when updating the /Filter-entry.
The existing unit-test doesn't work as intended, since the page never actually renders. Note how `cleanup` is *not* allowed to run when parsing and/or rendering is ongoing, however an (old) incorrect condition could prevent rendering from ever starting.
This is very old code, which has been slightly re-factored a couple of times (many years ago), however this doesn't appear to affect e.g. the default viewer since the incorrect behaviour seem highly dependent on "unlucky" timing.
Note also how at the start of the `PDFPageProxy.prototype.render`-method we purposely cancel any pending `cleanup`-call, to prevent unnecessary re-parsing for multiple sequential `render`-calls.
Finally, avoid running `cleanup` when document/page destruction has already started since it's pointless in that case.
- Remove the dependency on fit-curve;
- Improve the way to draw the current line in using a Path2D and
in clearing only the last part of the curve instead of clearing
all the canvas;
- Smooth the curve when drawing to avoid to have some changes after
the drawing ends;
- Make the smoothing a bit less agressive.
Given that the `css` property isn't constant, since it contains document/font ids, we cannot just check it directly. However, we can make use of regular expressions to ensure that the format is generally correct.
Originally the `PDFSidebarResizer` class was slightly larger, since the code used to contain e.g. feature testing for older (and no longer supported) browsers.
Given that there's some amount of overlap, when it comes to what DOM-elements and state that these classes need, it now seems reasonable to simply move the sidebar-resizing into the `PDFSidebar` class.
For the MOZCENTRAL build-target this patch reduces the size of the *built* `web/viewer.js` file by just over `1.1` kilobytes.
On my computer, it takes few tenths of a second to load a local font.
Since a font can be used several times in a document, the cache will
improve performances.
Some arabic chars like \ufe94 could be searched in a pdf, hence it must be normalized
when creating the search query. So to avoid to duplicate the normalization code,
everything is moved in the find controller.
The previous code to normalize text was using NFKC but with a hardcoded map, hence it
has been replaced by the use of normalize("NFKC") (it helps to reduce the bundle size
by 30kb).
In playing with this \ufe94 char, I noticed that the bidi algorithm wasn't taking into
account some RTL unicode ranges, the generated font wasn't embedding the mapping this
char and the unicode ranges in the OS/2 table weren't up-to-date.
When normalized some chars can be replaced by several ones and it induced to have
some extra chars in the text layer. To avoid any regression, when copying some text
from the text layer, a copied string is normalized (NFKC) before being put in the
clipboard (it works like this in either Acrobat or Chrome).
*Please note:* This patch only extends the `PDFFindController` implementation itself to support this functionality, however it's *purposely* not exposed in the default viewer.
This replaces the previous `phraseSearch`-parameter, and a `query`-string will now always be interpreted as a phrase-search.
To enable searching for individual words, the `query`-parameter must instead consist of an Array of strings. This way it's now also possible to combine phrase/word searches, with a `query`-parameter looking something like `["Lorem ipsum", "foo", "bar"]` which will search for the phrase "Lorem ipsum" *and* the words "foo" respectively "bar".
*Please note:* This parameter has never been used within the PDF.js library/viewer itself, and it was only ever added for backwards compatibility reasons.
This parameter was added in PR 7475, over six years ago, to try and optionally maintain the previous *default* text-extraction behaviour.
However as part of the general text-extraction improvements in PR 13257, almost two years ago, the `disableCombineTextItems` functionality was accidentally "broken" in various ways. Note how the only (very basic) unit-test was updated in a way that doesn't really make sense, since generally speaking you'd expect that using the option should result in *more* (or at least the same number of) text-items. Furthermore there's also the recent issue 16209, where the option causes almost all textContent to be concatenated together.
Hence this patch proposes that we simply remove the `disableCombineTextItems` option since it's essentially unused/untested functionality, as evident from the fact that it took almost two years for someone to notice that it's broken.
Unfortunately I don't believe that we can simply add a default `--scale-factor` CSS-variable to the `container`-element, since that might not be entirely appropriate/correct in all cases.[1]
However, we can at least print a console-error to hopefully make this situation more apparent to users. (This is purposely not using the `warn` helper-function, since those messages can be disabled.)
---
[1] One example is in our reference-tests, where we don't need to add it to the `container`-element itself.
PDF 32000-1:2008 7.10.5.1 "Type 4 (PostScript Calculator) Functions"
defers to the PostScript Language Reference for the description of these
functions. The PostScript Language Reference, third edition chapter 8
"Operators" defines the `angle` type as a "number of degrees". Section
8.1 defines "angle `sin` real", "angle `cos` real", and "num den `atan`
angle". The documentation for `atan` further states that it will return
an angle in degrees between 0 and 360.
Handle these operators correctly in `PostScriptEvaluator.execute`.
Convert the inputs to `sin` and `cos` from degrees to radians for use
with `Math.sin` and `Math.cos`. Correctly pop two values from the stack
for `atan`, use `Math.atan2`, and convert from radians to (positive)
degrees.
We introduced the use of OffscreenCanvas in #14754 and this patch aims
to use them for all kind of images.
It'll slightly improve performances (and maybe slightly decrease memory use).
Since an image can be rendered in using some transfer maps but because of
OffscreenCanvas we don't have the underlying pixels array the transfer maps
stuff is re-implemented in using the SVG filter feComponentTransfer.
Rather than repeatedly initializing a `canvasFactory`-instance for every page, move it to the document-level instead.
*Please note:* This patch is written using the GitHub UI, since I'm currently without a dev machine, so hopefully it works correctly.
I noticed several 'Path not found' errors because of a field called #subform[2].
From the XFA specs, the hash is used for a class of elements in the template tree.
When we're looking for a node in the datasets tree, it doesn't make sense to search
for a class. Hence the path element starting with a hash are just skipped.
Given that this helper function is only used on the worker-thread, there's no reason to duplicate it in both of the *built* `pdf.js` and `pdf.worker.js` files.
This further extends the web-specific import maps introduced in PR 16009, to allow removing *most* of the build-time `require` statements from the viewer. The few remaining ones are fallbacks used for the COMPONENTS respectively the `legacy` GENERIC builds.