- PR #13257 fixed a lot of issues but not all and this patch aims to fix almost all remaining issues.
- the idea in this new patch is to compare position of new glyph with the last position where a glyph has been drawn;
- no space are "drawn": it just moves the cursor but they aren't added in the chunk;
- so this way a space followed by a cursor move can be treated as only one space: it helps to merge all spaces into one.
- to make difference between real spaces and tracking ones, we used a factor of the space width (from the font)
- it was a pretty good idea in general but it fails with some fonts where space was too big:
- in Poppler, they're using a factor of the font size: this is an excellent idea (<= 0.1 * fontSize implies tracking space).
*Please note:* This is a tentative patch, since I don't have the necessary a11y-software to actually test it.
To avoid having to add a new API-method just for a single string, I figured that adding the new property to the existing `documentInfo`-data (accessed via `PDFDocumentProxy.getMetadata` in the API) will hopefully be deemed acceptable.
Looking at the code, I do have to agree with the point made in issue 12731 about it being unexpected/unhelpful that the `PDFFindController.executeCommand`-method isn't directly usable with the "find"-event.
The reason for it being this way is, as so often, for historical reasons: The `executeCommand`-method was added (just) prior to the introduction of the `EventBus` in the viewer.
Obviously we cannot simply change the existing `PDFFindController.executeCommand`-method, since that'd be a breaking change in code which has existed for over five years.
Initially I figured that we could simply add a new method in `PDFFindController` that'd accept the state from the "find"-event, however after thinking about this and looking through the use-cases in the default viewer I settled on a slightly different approach: Let the `PDFFindController` just listen for the "find"-event (on the `EventBus`-instance) directly instead, which also removes one level of (unneeded) indirection during searching in the default viewer.
For GENERIC builds of the PDF.js library, the old `PDFFindController.executeCommand`-method is still available with a deprecation warning.
Note how both the annotationLayer and the document outline will apply various URL-related options when creating the link-elements.
For consistency the `xfaLayer`-rendering should obviously use the same options, to ensure that the existing options are indeed applied to all URLs regardless of where they originate.
Having recently worked with, and reviewed patches touching, this code it seemed that it's probably not a bad idea to move that functionality into `createValidAbsoluteUrl` as new options instead.
For the `addDefaultProtocolToUrl` functionality in particular, the existing helper function was not only moved but slightly improved as well. Looking at the code, I realized that there's a small risk that it would incorrectly match a *relative* URL-string too.
With these changes, the `createValidAbsoluteUrl` call-sites in the `src/core/`-code can be simplified a little bit.
*Please note:* This patch may, indirectly, change the format of the `unsafeUrl`-property returned with relevant Annotations and OutlineItems; hence the `api-minor` tag.
However, I'd argue that it's actually more correct this way since the whole purpose of `unsafeUrl` is/was to return the URL data as-is without any parsing done.
- it aims to fix https://bugzilla.mozilla.org/show_bug.cgi?id=1716758;
- some buttons have a JS action with the pattern `app.launchURL(...)` (or similar) so extract when it's possible the url and generate a <a> element with the href equals to the found url;
- pdf.js already had some code to handle that so this patch slightly refactor that.
In order to implement this, we utilize the existing `bidi` function to infer the text-direction of /T and /Contents entries. While this may not be perfect in cases where one PopupAnnotation mixes LTR and RTL languages, it should work well enough in most cases.
To avoid having to add *two new* properties in lots of annotations, supplementing the existing `title`/`contents`-properties, this patch instead re-factors the existing code such that the properties are replaced by Objects (containing `str` and `dir`).
*Please note:* In order avoid breaking existing third-party implementations, `GENERIC`-builds of the PDF.js library will still provide the old `title`/`contents`-properties on annotations returned by `PDFPageProxy.getAnnotations`.
- In the pdf in issue #14071, some select fields don't contain any values;
- the corresponding node has a bindItems and a bind elements and _bindItems function was just not called.
With this patch we'll ensure that only valid absolute URLs can be used in XFA documents, similar to the existing validation done for "regular" PDF documents.
Furthermore, we'll also attempt to add a default protocol (i.e. `http`) to URLs beginning with "www." in XFA documents as well; this on its own is enough to fix https://bugzilla.mozilla.org/show_bug.cgi?id=1731240
- it aims to fix https://bugzilla.mozilla.org/show_bug.cgi?id=1719148;
- JS can set a property for a non-rendered annotation using the annotationStorage but the other unset default properties must be used when the annotation is finally rendered;
- so this patch just adds the properties already set in the annotationStorage to the default value.
*This fixes something that I noticed, having recently looked at both the `Lexer.getObj` and `writeValue` code.*
Please note that I unfortunately don't have an example of a form where saving fails without this patch. However, given its overall simplicity and that unit-tests are added, it's hopefully deemed useful to fix this potential issue pro-actively rather than waiting for a bug report.
At this point one might, and rightly so, wonder if there's actually any real-world PDF documents where a `null` value is being used?
Unfortunately the answer is *yes*, and we have a couple of examples in the test-suite (although none of those are related to forms); please see: `issue1015`, `issue2642`, `issue10402`, `issue12823`, `issue13823`, and `pr12564`.
- aims to fix https://bugzilla.mozilla.org/show_bug.cgi?id=1720179
- in some pdfs the XFA array in AcroForm dictionary doesn't contain an entry for 'datasets' (which contains saved data), so basically this patch allows to overwrite the AcroForm dictionary with an updated XFA array when doing an incremental update.
- when some named nodes in the template don't have their counterpart in datasets we create some nodes: the main node mustn't belong to the datasets namespace because it doesn't make sense and Acrobat Reader isn't able to read pdf with such nodes.
- so created nodes under a datasets node have a namespaceId set to -1 and consequently when serialized no namespace prefix will appear.
There's a fair number of regular expressions througout the code-base which are slightly more verbose than strictly necessary, in particular:
- We have a lot of regular expressions that use `[0-9]` explicitly, and those can be simplified to use `\d` instead.
- We have one instance of a regular expression containing a `A-Za-z0-9_` sequence, which can be simplified to use `\w` instead.
*This is a follow-up to PRs 13867 and 13899.*
This patch is tagged `api-minor` for the following reasons:
- It replaces the `renderInteractiveForms`/`includeAnnotationStorage`-options, in the `PDFPageProxy.render`-method, with the single `annotationMode`-option that controls which annotations are being rendered and how. Note that the old options were mutually exclusive, and setting both to `true` would result in undefined behaviour.
- For improved consistency in the API, the `annotationMode`-option will also work together with the `PDFPageProxy.getOperatorList`-method.
- It's now also possible to disable *all* annotation rendering in both the API and the Viewer, since the other changes meant that this could now be supported with a single added line on the worker-thread[1]; fixes 7282.
---
[1] Please note that in order to simplify the overall implementation, we'll purposely only support disabling of *all* annotations and that the option is being shared between the API and the Viewer. For any more "specialized" use-cases, where e.g. only some annotation-types are being rendered and/or the API and Viewer render different sets of annotations, that'll have to be handled in third-party implementations/forks of the PDF.js code-base.
This patch removes the only remaining closure in the `src/display/api.js` file, utilizing a similar approach as used in lots of other parts of the code-base, which results in a small decrease in the size of the *build* `pdf.js` file.
Given that `PDFWorker` is exposed through the *public* API, this complicates things somewhat since there's a couple of worker-related properties that really should stay *private*. Initially, while working on PR 13813, I believed that we'd need support for private (static) class fields in order to get rid of this closure, however I've managed to come up with what's hopefully deemed an acceptable work-around here.
Furthermore, some helper functions were simply moved into the `PDFWorker` class as static methods, thus simplifying the overall implementation (e.g. we don't need to manually cache the Promise in the `PDFWorker._setupFakeWorkerGlobal`-method).
Finally, as part of this re-factoring a number of missing JSDoc-comments were added which *together* with the removal of the closure significantly improves the `gulp jsdoc` output for the `PDFWorker` class.
*Please note:* This patch is tagged with `api-minor` since it deprecates `PDFWorker.getWorkerSrc()` in favor of the shorter `PDFWorker.workerSrc`, with the fallback limited to `GENERIC` builds.
With the changes made in PR 13746 the *internal* renderingIntent handling became somewhat "messy", since we're now having to do string-matching in various spots in order to handle the "oplist"-intent correctly.
Hence this patch, which implements the idea from PR 13746 to convert the `intent`-strings, used in various API-methods, into an *internal* renderingIntent that's implemented using a bit-field instead. *Please note:* This part of the patch, in itself, does *not* change the public API (but see below).
This patch is tagged `api-minor` for the following reasons:
1. It changes the *default* value for the `intent` parameter, in the `PDFPageProxy.getAnnotations` method, to "display" in order to be consistent across the API.
2. In order to get *all* annotations, with the `PDFPageProxy.getAnnotations` method, you now need to explicitly set "any" as the `intent` parameter.
3. The `PDFPageProxy.getOperatorList` method will now also support the new "any" intent, to allow accessing the operatorList of all annotations (limited to those types that have one).
4. Finally, for consistency across the API, the `PDFPageProxy.render` method also support the new "any" intent (although I'm not sure how useful that'll be).
Points 1 and 2 above are the significant, and thus breaking, changes in *default* behaviour here. However, unfortunately I cannot see a good way to improve the overall API while also keeping `PDFPageProxy.getAnnotations` unchanged.
Given how trivial the `isEOF` function is, we can simply inline the check at the various call-sites and remove the function (which ought to be ever so slightly more efficient as well).
Furthermore, this patch also changes the `EOF` primitive itself to a `Symbol` instead of an Object since that has the nice benefit of making it unclonable (thus preventing *accidentally* trying to send `EOF` from the worker-thread).
This patch utilizes the same approach as used in lots of other parts of the code-base, which thus *slightly* reduces the size of this code.
By removing some of the (current) indirection, we can also simplify the JSDocs a little bit. Looking at the `gulp jsdoc` output, this actually seem to *improve* the documentation for this class.
With this patch, the `PDFPageProxy.getOperatorList` method will now return `PDFOperatorList`-instances that also include Annotation-operatorLists (when those exist). Hence this closes a small, but potentially confusing, gap between the `render` and `getOperatorList` methods.
Previously we've been somewhat reluctant to do this, as explained below, but given that there's actual use-cases where it's required probably means that we'll *have* to implement it now.
Since we still need the ability to separate "normal" rendering operations from direct `getOperatorList` calls in the worker-thread, this API-change unfortunately causes the *internal* renderingIntent to become a bit "messy" which is indeed unfortunate (note the `"oplist-"` strings in various spots). As-is I suppose that it's not all that bad, but we may want to consider changing the *internal* renderingIntent to e.g. a bitfield in the future.
Besides fixing issue 13704, this patch would also be necessary if someone ever tries to implement e.g. issue 10165 (since currently `PDFPageProxy.getOperatorList` doesn't include Annotation-operatorLists).
*Please note:* This patch is *also* tagged "api-minor" for a second reason, which is that we're now including the Annotation-id in the `beginAnnotation` argument. The reason for this is to allow correlating the Annotation-data returned by `PDFPageProxy.getAnnotations`, with its corresponding operatorList-data (for those Annotations that have it).