By leveraging import maps we can get rid of *most* of the remaining `require`-calls in the `src/display/`-folder, since we should strive to use modern `import`-statements wherever possible.
The only remaining cases are Node.js-specific dependencies, since those seem very difficult to convert unless we start producing a bundle *specifically* for Node.js environments.
With the changes in the previous patch the `isNodeJS`-helper no longer needs to live in its own file, which helps get rid of a closure in the *built* files.
Currently this class contains a few "special" code-paths for the COMPONENTS build-target, which normally wouldn't be a problem. However, in this particular case that means accessing code that we don't want to include unconditionally in all builds.
This is currently implemented using build-time `require`-calls which we nowadays want to avoid, and we should strive to remove all such cases from the code-base. (Generally speaking `import` is the future, and build-tools may not always play well with a mix of both formats.)
We can easily improve things here by using sub-classing for the COMPONENTS build-target, and then use the ability to re-name when exporting (to avoid breaking existing code).
The existing code is unable to *correctly* extract the color from the appearance-stream when the ColorSpace-data is "complex". To reproduce this:
- Open `freetexts.pdf` in the viewer.
- Note the purple color of the "Hello World from Preview" annotation.
- Enable any of the Editors.
- Note how the relevant annotation is now black.
When there was a rotation, the generated bbox was wrong because of an inversion
between width and height.
This patch aims to fix this issue in re-writing the FreeText code generation
to have something similar to what Acrobat does.
And fix the name of the font which wasn't the correct one when calling the
evaluator.
Rather than having to *manually* determine the potential `transfers` at various spots in the API, we can let the `AnnotationStorage.serializable` getter include this.
To further simplify things, we can also let the `serializable` getter compute and include the `hash`-string as well.
In order to minimize the size the of a saved pdf, we generate only one
image and use a reference in each annotation using it.
When printing, it's slightly different since we have to render each page
independantly but we use the same image within a page.
It occurred to me that we can actually run this unit-test in Node.js environments by making use of the preprocessor to stub out the browser globals there.
Until now we've not actually had *any* tests that ensure that the *official* PDF.js-viewer API exposes the intended functionality, which means that things can easily break accidentally.
*Please note:* This unit-test cannot (easily) be run in Node.js-environments, since the `external/webL10n/l10n.js` file contains various browser-specific functionality.
Until now we've not actually had *any* tests that ensure that the *official* PDF.js API exposes the intended functionality, which means that things can easily break accidentally.
- it'll help to be able to move popups on screen to let the user read the text
- popups won't inherit some properties from their parent:
- the popup can be misrendered if for example the parent has a clip-path property.
- add an outline to the popup when the parent is focused.
- hide a popup when it's clicked.
Fix handling of /Filter-entries, since the current implementation could potentially corrupt the data if there's multiple filters present.
Please note that filters are applied *sequentially* during decoding, starting from the first one in the Array, hence the first Array-entry needs to be /FlateDecode in order for things to actually work correctly.
To prevent a future bug, if we want to save more "complex" data such as images, also ensure that we include any existing /DecodeParms-entries when updating the /Filter-entry.
The existing unit-test doesn't work as intended, since the page never actually renders. Note how `cleanup` is *not* allowed to run when parsing and/or rendering is ongoing, however an (old) incorrect condition could prevent rendering from ever starting.
This is very old code, which has been slightly re-factored a couple of times (many years ago), however this doesn't appear to affect e.g. the default viewer since the incorrect behaviour seem highly dependent on "unlucky" timing.
Note also how at the start of the `PDFPageProxy.prototype.render`-method we purposely cancel any pending `cleanup`-call, to prevent unnecessary re-parsing for multiple sequential `render`-calls.
Finally, avoid running `cleanup` when document/page destruction has already started since it's pointless in that case.
- Remove the dependency on fit-curve;
- Improve the way to draw the current line in using a Path2D and
in clearing only the last part of the curve instead of clearing
all the canvas;
- Smooth the curve when drawing to avoid to have some changes after
the drawing ends;
- Make the smoothing a bit less agressive.
Given that the `css` property isn't constant, since it contains document/font ids, we cannot just check it directly. However, we can make use of regular expressions to ensure that the format is generally correct.
Originally the `PDFSidebarResizer` class was slightly larger, since the code used to contain e.g. feature testing for older (and no longer supported) browsers.
Given that there's some amount of overlap, when it comes to what DOM-elements and state that these classes need, it now seems reasonable to simply move the sidebar-resizing into the `PDFSidebar` class.
For the MOZCENTRAL build-target this patch reduces the size of the *built* `web/viewer.js` file by just over `1.1` kilobytes.
On my computer, it takes few tenths of a second to load a local font.
Since a font can be used several times in a document, the cache will
improve performances.
Some arabic chars like \ufe94 could be searched in a pdf, hence it must be normalized
when creating the search query. So to avoid to duplicate the normalization code,
everything is moved in the find controller.
The previous code to normalize text was using NFKC but with a hardcoded map, hence it
has been replaced by the use of normalize("NFKC") (it helps to reduce the bundle size
by 30kb).
In playing with this \ufe94 char, I noticed that the bidi algorithm wasn't taking into
account some RTL unicode ranges, the generated font wasn't embedding the mapping this
char and the unicode ranges in the OS/2 table weren't up-to-date.
When normalized some chars can be replaced by several ones and it induced to have
some extra chars in the text layer. To avoid any regression, when copying some text
from the text layer, a copied string is normalized (NFKC) before being put in the
clipboard (it works like this in either Acrobat or Chrome).
*Please note:* This patch only extends the `PDFFindController` implementation itself to support this functionality, however it's *purposely* not exposed in the default viewer.
This replaces the previous `phraseSearch`-parameter, and a `query`-string will now always be interpreted as a phrase-search.
To enable searching for individual words, the `query`-parameter must instead consist of an Array of strings. This way it's now also possible to combine phrase/word searches, with a `query`-parameter looking something like `["Lorem ipsum", "foo", "bar"]` which will search for the phrase "Lorem ipsum" *and* the words "foo" respectively "bar".
*Please note:* This parameter has never been used within the PDF.js library/viewer itself, and it was only ever added for backwards compatibility reasons.
This parameter was added in PR 7475, over six years ago, to try and optionally maintain the previous *default* text-extraction behaviour.
However as part of the general text-extraction improvements in PR 13257, almost two years ago, the `disableCombineTextItems` functionality was accidentally "broken" in various ways. Note how the only (very basic) unit-test was updated in a way that doesn't really make sense, since generally speaking you'd expect that using the option should result in *more* (or at least the same number of) text-items. Furthermore there's also the recent issue 16209, where the option causes almost all textContent to be concatenated together.
Hence this patch proposes that we simply remove the `disableCombineTextItems` option since it's essentially unused/untested functionality, as evident from the fact that it took almost two years for someone to notice that it's broken.