- Take into account the page translation,
- Take into account the correct translation for the editor border,
- Take into account the position of the first glyph in the annotation,
- Take into account the rotation of the editor.
Close#16633.
- Do the /Filter and /DecodeParms lookup in parallel, since that ought to be a *tiny* bit more efficient.
- Avoid code-duplication when `CompressionStream` isn't supported, since we already have a fallback code-path at the end of the function.
createImageBitmap doesn't work with svg files (see bug 1841972), so we need to workaround
this in using an Image.
When printing/saving we must rasterize the image, hence we get the biggest bitmap as image
reference to avoid duplications or poor quality on rendering.
The existing code is unable to *correctly* extract the color from the appearance-stream when the ColorSpace-data is "complex". To reproduce this:
- Open `freetexts.pdf` in the viewer.
- Note the purple color of the "Hello World from Preview" annotation.
- Enable any of the Editors.
- Note how the relevant annotation is now black.
Note how we're accidentally using the wrong operator when trying to parse CMYK colors. I'm not aware of any bugs caused by this, since it seems uncommon in practice for annotations to specify text-colors in CMYK format.
When there was a rotation, the generated bbox was wrong because of an inversion
between width and height.
This patch aims to fix this issue in re-writing the FreeText code generation
to have something similar to what Acrobat does.
And fix the name of the font which wasn't the correct one when calling the
evaluator.
Given that the PDF.js library has never officially supported/documented that binary data can be provided as a `Buffer`, and that it's been explicitly deprecated in *four* releases, it seems reasonable that we outright reject such data instead (to reduce the amount of Node.js specific code-paths).
We've now been throwing an Error in *three* releases if the `canvasFactory` option is provided, hence it ought to be fine to stop doing that and simply ignore the option instead.
Rather than having to *manually* determine the potential `transfers` at various spots in the API, we can let the `AnnotationStorage.serializable` getter include this.
To further simplify things, we can also let the `serializable` getter compute and include the `hash`-string as well.
This is something that I completely overlooked during review of PR 16593, since the idea is (obviously) that the viewer-components should be usable as-is without the user needing to manually pass in any *additional* parameters.
To support this we can very easily expose the current `FilterFactory`-instance on the `PDFPageProxy`-class[1], and if needed initialize the highlight-filters when initializing the page (again limited to the viewer-components).
In order to minimize the size the of a saved pdf, we generate only one
image and use a reference in each annotation using it.
When printing, it's slightly different since we have to render each page
independantly but we use the same image within a page.
- Modify the text and background colors in popup to fit a11y requirements
- Add a backdrop filter on clickable areas in using a svg filter mapping
canvas colors to Highlight and HighlightText ones.
These constants were added "speculatively" in PR 10820, almost four years ago, but have never actually been used. We already have issue 10982 that tracks *potentially* extending support for the affected annotation-format, however until that happens I really don't think that we should keep shipping completely unused code in the PDF.js library.
For the MOZCENTRAL build-target, i.e. the Firefox PDF Viewer, this reduces the total bundle size by 1.1 kilo-byte.
With the changes in PR 16552 we can now move general translation into the `AnnotationLayer` itself, which should improve things ever so slightly in third-party implementations where the default viewer isn't used.
*This is something that I completely overlooked during review of PR 16552, despite leaving a l10n-related comment.*
The new l10n-handling of PopupAnnotations assume that the `AnnotationLayer` is always initialized with a l10n-instance, which might not actually be the case in third-party implementations where the default viewer isn't used.
To work-around that we'll now bundle, and fallback on, the existing `NullL10n`-implementation in GENERIC builds of the PDF.js library. This will only result in a slight file-size increase for the *built* `pdf.js` file, again limited to GENERIC builds, since the `web/l10n_utils.js` file has no dependencies.
Also, tweaks a couple of TESTING pre-processor checks to *only* include that code when running the reference tests.
- it'll help to be able to move popups on screen to let the user read the text
- popups won't inherit some properties from their parent:
- the popup can be misrendered if for example the parent has a clip-path property.
- add an outline to the popup when the parent is focused.
- hide a popup when it's clicked.
Fix handling of /Filter-entries, since the current implementation could potentially corrupt the data if there's multiple filters present.
Please note that filters are applied *sequentially* during decoding, starting from the first one in the Array, hence the first Array-entry needs to be /FlateDecode in order for things to actually work correctly.
To prevent a future bug, if we want to save more "complex" data such as images, also ensure that we include any existing /DecodeParms-entries when updating the /Filter-entry.
The existing unit-test doesn't work as intended, since the page never actually renders. Note how `cleanup` is *not* allowed to run when parsing and/or rendering is ongoing, however an (old) incorrect condition could prevent rendering from ever starting.
This is very old code, which has been slightly re-factored a couple of times (many years ago), however this doesn't appear to affect e.g. the default viewer since the incorrect behaviour seem highly dependent on "unlucky" timing.
Note also how at the start of the `PDFPageProxy.prototype.render`-method we purposely cancel any pending `cleanup`-call, to prevent unnecessary re-parsing for multiple sequential `render`-calls.
Finally, avoid running `cleanup` when document/page destruction has already started since it's pointless in that case.
After PR 16226 the deprecated SVG back-end is now unused in development mode, with the exception of unit-tests, hence we can re-factor how it's exposed in the API to avoid including a useless webpack-closure in e.g. the *built-in* Firefox PDF Viewer.
Given that this API method isn't used anywhere within the PDF.js library itself, except for the unit-tests, we can avoid including what's effectively dead code in e.g. the *built-in* Firefox PDF Viewer.
- Don't attempt to lookup an "SM" entry, since we're only using "SMask" in the `PDFImage` code and I also cannot find any mention in the PDF specification about that being a valid abbreviation for a Soft Mask entry. (There's only a `SM = Smoothness Tolerance` Graphics State parameter, which is obviously something completely different.)
- Don't lookup the /SMask and /Mask entries unless it's actually an inline image, since it's pointless otherwise.
- Last, but most importantly, only check for the *existence* of /SMask and /Mask entries but don't actually fetch the data. Note that if either one exists it'll contain a Stream, and those cannot be cached on the `XRef`-instance, which leads to unnecessary parsing/allocations and in this case we're not using the actual data for anything.
This patch is the result of me going through some old issues regarding non-embedded Wingdings support.
There's a few different things wrong in the referenced PDF document:
- The /BaseFont and /FontName entries don't agree on the name of the fonts, with one font using `/BaseFont /Wingdings-Regular` and `/FontName /wg09np` which obviously makes no sense.
To address this we'll compare the font-names against our lists of known ones and ignore /FontName entries that don't make sense iff the /BaseFont entry is a known font-name.
- The non-embedded Wingdings font also set an incorrect /Encoding, in this case /MacRomanEncoding, which should have been fixed by PR 16465. However this doesn't work since the font has *bogus* font-flags, that fail to categorize the font as Symbolic.
To address this we'll also compare the font-name against the list of known symbol fonts.
As far as I can tell there's no particular reason for initializing `KeyboardManager`-instances eagerly, since the user may never use editing, and we can easily do this lazily instead by utilizing shadowed getters.
This patch updates the minimum supported browsers as follows:
- Google Chrome 92, which was released on 2021-07-20; see https://support.google.com/chrome/a/answer/10314655
Note that nowadays we usually try, where feasible and possible, to support browsers that are about two years old. By limiting support to only "recent" browsers we reduce the risk of holding back improvements of the *built-in* Firefox PDF Viewer, and also (significantly) reduce the maintenance/support burden for the PDF.js contributors.
*Please note:* As always, the minimum supported browser version assumes that a `legacy`-build of the PDF.js library is being used; see https://github.com/mozilla/pdf.js/wiki/Frequently-Asked-Questions#faq-support
Now that font-substitution has been implemented, we should be able to do much a better job at supporting non-embedded Wingdings fonts.
Given that this is a Windows-specific font, see https://en.wikipedia.org/wiki/Wingdings, this is however not guaranteed to work (well) on other platforms.
The affected font is non-embedded ZapfDingbats, however the PDF document for some inexplicable reason specifies the encoding as "WinAnsiEncoding" (which is obviously wrong).
To work-around this bug in the PDF generator, we'll simply ignore any explicitly specified named encoding for non-embedded symbol fonts.
- Remove the dependency on fit-curve;
- Improve the way to draw the current line in using a Path2D and
in clearing only the last part of the curve instead of clearing
all the canvas;
- Smooth the curve when drawing to avoid to have some changes after
the drawing ends;
- Make the smoothing a bit less agressive.
Given that inline images may contain "EI"-sequences in the image-data itself, actually finding the end-of-image operator isn't always straightforward.
Here we extend the implementation from PR 12028 to potentially check all of the following bytes, rather than stopping immediately. While we have fairly decent test-coverage for this code, whenever you're changing it there's unfortunately a slightly higher than normal risk of regressions. (You'd really wish that PDF generators just stop using inline images.)
- if the contours count is lower than -1, the glyph is really likely wrong
so just remove it from the font;
- if a contour has the repeat flag then repeats count mustn't be 0.
The pdf linked in bug 1135277 contains a lot of stroke instructions.
In using the Firefox profiler, this patch helps to reduce the overall
spent time in this function by 30%.
According to https://en.wikipedia.org/wiki/Impact_(typeface) this font should be available on all current versions of Windows, and with the recently added font-substitution we should actually be able to render it correctly (at least on Windows).
The `fontID` handling is quite old and predates the use of the `idFactory` to generate a unique id for each font, hence we can simplify this code a little bit.
When fixing bug 1766987, I thought the field formatted value came from
the result of the format callback: I was wrong. The format callback is ran
but the value is unused (maybe it's useful to set some global vars... or
it's just a bug in Acrobat). Anyway the value to display is the one rendered
in the AP stream.
The field value setter has been simplified and that fixes issue #16409.
This essentially extends PR 11218 to also apply when looking up the final font-reference, via the XRef-table, fails because the font isn't available.
This patch also changes `PartialEvaluator.fallbackFontDict` to simply use "Helvetica" as the default font-name, since that seems generally reasonable given the now existing font-substitution code.
After PR 12563 we're now free to use optional chaining in the worker-thread as well. (This patch also fixes one previously "missed" case in the `web/` folder.)
For the MOZCENTRAL build-target this patch reduces the total bundle-size by `1.6` kilobytes.
On my computer, it takes few tenths of a second to load a local font.
Since a font can be used several times in a document, the cache will
improve performances.
- Replace FoxitSans with LiberationSans: LiberationSans is already there (for XFA) and we can use
it as a good replacement of FoxitSans.
- For now we just try to substitue standard fonts, the strategy is the following:
* we try to find a font locally from a hardcoded list;
* if it fails then we use Liberation as fallback (only for Helvetica for the moment);
* else we just fallback on the system serif/sansserif/monospace font.
This patch updates the minimum supported browsers as follows:
- Safari 15.4, which was released on 2022-03-15; see https://en.wikipedia.org/wiki/Safari_version_history#Safari_15
Nowadays we usually we try, where feasible and possible, to support browsers that are about two years old. The reasons for limiting support to a *somewhat* more recent Safari version include:
- Throughout the history of the PDF.js project, Safari has always been the worst browser to attempt to support. Compared to other browsers there's a disproportionate number of bugs affecting Safari, especially on iOS, and in most cases those are browser-specific issues that we simply cannot address.[1]
- Safari has often been a lot slower, compared to other browsers, at implementing new web-platform features. Historically this has sometimes blocked usage of new features, for the benefit of the Firefox PDF Viewer, and it's very often meant having to include and maintain polyfills *only* for Safari.
- The current (minimum) supported Safari version lack enough functionality that polyfills placed in the `src/shared/compatibility.js` file are unfortunately not sufficient, but it also requires a bunch of special-cases in both the `gulpfile` and in the `web/`-code.
- Given that the *built-in* Firefox PDF Viewer is the primary development target for the PDF.js library, and the general development pace these days, we need to limit the maintenance "overhead" caused by other browsers.
---
[1] In a few cases a work-around might be possible, however it'd negatively affect e.g. performance, readability, and/or maintainability of the code.
Originally we only used the `structuredClone` polyfill in the `LoopbackPort`-implementation, and that obviously isn't used anywhere within the various image decoders.
At this point in time we've started to use `structuredClone` a little bit more, hence it seems overall simpler to just bundle the polyfill even in the `legacy`-version of the IMAGE_DECODERS built-target.
For some time these checks have only targeted Node.js environments, since the features in question exist in all supported browsers (even when a `legacy`-build is used).
Now that we've updated the minimum supported Node.js version to 18, a number of polyfills are thus (finally) no longer necessary in that environment. Hence for certain *basic* functionality, such as e.g. text-extraction, it's now possible to use either a modern- or a `legacy`-build of the PDF.js library in Node.js environments.
*Please note:* For e.g. canvas-rendering in Node.js environments it's still necessary to use a `legacy`-build, since that functionality requires various polyfills.
This patch updates the minimum supported environments as follows:
- Node.js 18, which was released on 2022-04-19; see https://en.wikipedia.org/wiki/Node.js#Releases
Note also that Node.js 16 will soon reach EOL, and thus no longer receive any security updates.
The /Decode-implementation in the our JPEG decoder, i.e. `src/core/jpg.js`, seems to only handle *inverting* of images properly. To support arbitrary /Decode-entries correctly we'll always use the `PDFImage.decodeBuffer` method, even for "simple" JPEG images, which should be fine since non-default /Decode-entries aren't a very common occurrence.
*Please note:* This patch will lead to a little bit of movement in some existing test-cases, however it should be virtually imperceivable to the naked eye.
This property was added in PR 12726 specifically for use in the `getFontType` function, indirectly used by the `PDFDocumentProxy.stats` getter in the API.
In PR 15880 that functionality was removed, but I forgot to remove this now unused font-property.
Now that we no longer depend on the old Babel version in SystemJS we can remove the `static get ...` work-arounds used to define constants, which leads to slightly more compact code.
Now that https://bugzilla.mozilla.org/show_bug.cgi?id=1247687 has landed in Firefox, we're able to use worker-modules during development :-)
This removes the final piece of SystemJS usage from the PDF.js library, thus allowing a fair bit of clean-up, and we now use *only* native `import`/`export` statements everywhere in development mode.
When the `GlobalImageCache` implementation originally landed, back in PR 11912, the image handling was slightly more complex (with e.g. browser-decoding of some JPEG images). At this point it no longer seems necessary to manually handle pageIndexes in this way, and we should be able to simply inline that in the `GlobalImageCache.shouldCache` method.
This method was added in PR 4938, almost nine years ago, however it doesn't appear to ever have been used.
Given the similarities between the `PDF17` and `PDF20` classes, and how they're used, if the `PDF20.hash` method was actually necessary you'd also expect a similiar method in the `PDF17` class.
The "binary" CMap-format is specific to the PDF.js library, and is used to reduce the size of the built-in CMap data-files.
By moving this code to its own file we can remove the nowadays unnecessary closures, which helps to slightly reduce the size of this code.
Some arabic chars like \ufe94 could be searched in a pdf, hence it must be normalized
when creating the search query. So to avoid to duplicate the normalization code,
everything is moved in the find controller.
The previous code to normalize text was using NFKC but with a hardcoded map, hence it
has been replaced by the use of normalize("NFKC") (it helps to reduce the bundle size
by 30kb).
In playing with this \ufe94 char, I noticed that the bidi algorithm wasn't taking into
account some RTL unicode ranges, the generated font wasn't embedding the mapping this
char and the unicode ranges in the OS/2 table weren't up-to-date.
When normalized some chars can be replaced by several ones and it induced to have
some extra chars in the text layer. To avoid any regression, when copying some text
from the text layer, a copied string is normalized (NFKC) before being put in the
clipboard (it works like this in either Acrobat or Chrome).
This *special* build-target is very old, and was introduced with the first pre-processor that only uses comments to enable/disable code.
When the new pre-processor was added `PRODUCTION` effectively became redundant, at least in JavaScript code, since `typeof PDFJSDev === "undefined"` checks now do the same thing.
This patch proposes that we remove `PRODUCTION` from the JavaScript code, since that simplifies the conditions and thus improves readability in many cases.
*Please note:* There's not, nor has there ever been, any gulp-task that set `PRODUCTION = false` during building.
This method was originally added in PR 1320, eleven years ago, however it doesn't appear to ever have been used (not even from the start).
Furthermore, this method also tries to access a property that doesn't exist (`this.out`) and then call a method that also doesn't exist (`writeByteArray`).
In looking at https://bugs.ghostscript.com/show_bug.cgi?id=706451 I noticed that bug2.pdf was pretty
slow to load for such a basic file.
In profiling I noticed that a lot of time is spent in Array.concat, hence this patch use Array.push when
it's possible (it's now ~3 times faster).
The changes in PR 16238 were intended specifically for Node.js environments, however they accidentally applied to older browsers as well.
*Please note:* In up-to-date browsers `Path2D` is available in Workers, which should be connected to the introduction of `OffscreenCanvas`.
Apparently the `structuredClone` polyfill doesn't handle transfers correctly, and `DOMException`s may thus be thrown. This is particularly problematical in Node.js environments, where that exception (obviously) isn't available.
To work-around these issues we'll simply ignore any transfers in `legacy`-builds, since those *may* use the `structuredClone` polyfill. This will obviously lead to slightly higher memory usage in those builds, however this really only affects Node.js environments. (Browsers are only affected if workers are disabled, however that's never been an officially recommended/supported configuration.)
Currently we have two separate image-caches on the worker-thread:
- A local one, which is unique to each `PartialEvaluator.getOperatorList` invocation. This one caches both names *and* references, since image-resources may be accessed in either way.
- A global one, which applies to the entire PDF documents and all its pages. This one only caches references, since nothing else would work.
This patch introduces a third image-cache, which essentially sits "between" the two existing ones. The new `RegionalImageCache`[1] will be usable throughout a `PartialEvaluator` instance, and consequently it *only* caches references, which thus allows us to keep track of repeated image-resources found in e.g. different /Form and /SMask objects.
---
[1] For lack of a better word, since naming things is hard...
*Please note:* This parameter has never been used within the PDF.js library/viewer itself, and it was only ever added for backwards compatibility reasons.
This parameter was added in PR 7475, over six years ago, to try and optionally maintain the previous *default* text-extraction behaviour.
However as part of the general text-extraction improvements in PR 13257, almost two years ago, the `disableCombineTextItems` functionality was accidentally "broken" in various ways. Note how the only (very basic) unit-test was updated in a way that doesn't really make sense, since generally speaking you'd expect that using the option should result in *more* (or at least the same number of) text-items. Furthermore there's also the recent issue 16209, where the option causes almost all textContent to be concatenated together.
Hence this patch proposes that we simply remove the `disableCombineTextItems` option since it's essentially unused/untested functionality, as evident from the fact that it took almost two years for someone to notice that it's broken.
Originally we used helper functions for checking if something was a Dictionary or Stream, and then having an initial `typeof` check probably made sense.
However, given that we're using `instanceof` nowadays the additional check longer seems necessary.
Currently we're *virtually* duplicating the same code, for validating quotation marks, twice in this helper function.
The size decrease is quite small (107 bytes) and this makes the code slightly harder to reader, hence I completely understand if this patch is rejected.
Given that this functionality only applies in the viewer, when `PDFBug` is being enabled and used, it can't hurt to slightly reduce the size of this code.
Having just reviewed a patch touching this code, I couldn't help noticing that an `Object` isn't really the optimal data-structure for this and nowadays we can do better by using a `Set` instead.