- font line height is taken into account by acrobat when it isn't with masterpdfeditor: I extracted a font from a pdf, modified some ascent/descent properties thanks to ttx and the reinjected the font in the pdf: only Acrobat is taken it into account. So in this patch, line heights for some substituted fonts are added.
- it seems that Acrobat is using a line height of 1.2 when the line height in the font is not enough (it's the only way I found to fix correctly bug 1718741).
- don't use flex in wrapper container (which was causing an horizontal overflow in the above bug).
- consequently, the above fixes introduced a lot of small regressions, so in order to see real improvements on reftests, I fixed the regressions in this patch:
- replace margin by padding in some case where padding is a part of a container dimensions;
- remove some flex display: some containers are wrongly sized when rendered;
- set letter-spacing to 0.01px: it helps to be sure that text is not broken because of not enough width in Firefox.
Previously, when we filled image masks we didn't copy over the current transformation,
this caused patterns to be misaligned when painted. Now we create a temporary
canvas with the mask and have the transform copied over and offset it relative to
where the mask would be painted. We also weren't properly offsetting tiling patterns.
This isn't usually noticeable since patters repeat, but in the case of #13561 the pattern
is only drawn once and has to be in the correct position to line up with the mask image.
These fixes broke #11473, but highlighted that we were drawing that correctly by
accident and not correctly handling negative bounding boxes on tiling patterns.
Fixes#6297, #13561, #13441
Partially fixes#1344 (still blurry but boxes are in correct position now)
- Fix a typo in order to open the pdf in issue #13679
- After fixing the fill default color there wer some regressions because of z-index
and when fixing z-index there were some regressions because of borders
- So fix the borders rendering.
*While I cannot guarantee that this will fix the recent intermittents, this patch really shouldn't hurt.*
By setting the Image `src` first, there's a small possibility that the Image is loaded *before* we've had a change to attach the `onload`/`onerror` callbacks which may cause the Promise to remain in a pending state.
Note that prior to PR 13641 we didn't correctly await all image resources to actually load, which could explain the very recent intermittent test-failures.
The PDF.js API has only ever supported accessing the original file ID, however the second one that (should) exist in *modified* documents have thus far been completely inaccessible through the API.
That seems like a simple oversight, caused e.g. by the viewer not needing it, since it really shouldn't hurt to provide API-users with the ability to check if a PDF document has been modified since its creation.[1]
Please refer to https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf#G13.2261661 for additional information.
For an example of how to update existing code to use the new API, please see the changes in the `web/app.js` file included in this patch.
*Please note:* While I'm not sure if we'll ever be able to remove the old `PDFDocumentProxy.fingerprint` getter, given that it's existed since "forever", that probably isn't a big deal given that it's now limited to only `GENERIC`-builds.
---
[1] Although this obviously depends on the PDF software following the specification, by updating the second file ID as intended.
- it aims to fix#13583;
- fix the switch to breakBefore target;
- force the layout of an unsplittable element on an empty page;
- don't fail when there is horizontal overflow (except in lr-tb);
- handle correctly overflow in the same content area (bug 1717805, bug 1717668);
- fix a typo in radial gradient first argument.
- the break element has been deprecated in XFA 2.4 but some old documents can use it, so replace it with one (or more) of its possible substitutions:
- breakBefore;
- breakAfter;
- overflow.
- when binding (after parsing) we get a map between some template nodes and some data nodes;
- so set user data in input handlers in using data node uids in the annotation storage;
- to save the form, just put the value we have in the storage in the correct data nodes, serialize the xml as a string and then write the string at the end of the pdf using src/core/writer.js;
- fix few bugs around data bindings:
- the "Off" issue in Bug 1716980.
According to a comment in `readCmapTable`, we're assuming that the cmap tables (when more than one exist) are sorted in ascending order. If that's not the case, keep checking the following cmap tables in order to fix the referenced issue.
- when the CSS line-height property is set to 'normal' then the value depends of the user agent. So use a line height based on the font itself and if for any reasons this value is not available use 1.2 as default.
- it's a partial fix for https://bugzilla.mozilla.org/show_bug.cgi?id=1717681.
Apparently some really bad PDF software can create documents with *empty* `Name`-entries, which we thus need to somehow deal with.
While I don't know if this patch is necessarily the best solution, it should at least ensure that the *empty* `Name`-instance cannot accidentally match a proper `Name`-instance (and it doesn't require changes to a lot of existing code).[1]
---
[1] I briefly considered using a `Symbol` rather than an Object, but quickly decided against that since the former one [is not clonable](https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers_API/Structured_clone_algorithm#supported_types) and `Name`-instances may be sent to the API.
- it aims to fix https://bugzilla.mozilla.org/show_bug.cgi?id=1716838.
- some fonts in the pdf in the bug where bold when they shouldn't so write the font properties in the html to avoid to use some wrong inherited ones.
- partial fix for https://bugzilla.mozilla.org/show_bug.cgi?id=1716980;
- some pdf can contain an invalid font family (e.g. 'Windings 3') so in this case remove the space;
- the font family in typeface attribute doesn't always match the one defined in the FontDescriptor dictionary.
- PR #13554 is buggy, so this patch aims to fix bugs.
- check if a component fits into its parent in taking into account the parent layout.
- introduce method isSplittable for template nodes to know if a component can be splitted in case of overflow.
- some containers doesn't always have their 2 dimensions and those dimensions re based on contents;
- so in order to measure text, we must get the glyph widths (for the xfa fonts) before starting the layout;
- implement a word-wrap algorithm;
- handle font change during text layout.
Note, this only really fixes Radial/Axial shading patterns with masks.
I'm guessing tiling patterns and mesh patterns would also be broken
if applied like the test pdf. Hopefully I'll have some time to make
test cases for the other shadings.
Fixes#13372
- and fix few bugs:
- avoid infinite loop when layout the document;
- avoid confusion between break and layout failure;
- don't add margin width in tb layout when getting available space.
By adding basic linting of JSON files, we can ensure that they're actually valid and prevent e.g. test-failures caused by *accidental* errors when editing the `test/test_manifest.json` file (something that I've done *many* times myself).
For now this simply uses the `recommended` configuration, but we can obviously tweak this later if/when needed. Please find additional information at https://github.com/azeemba/eslint-plugin-json
- it aims to avoid to loop forever when opening pdf in #13213;
- the idea is to consider subformSet as inexistent when running in the tree. So if we've subformA > subformSet > subformB then subformB will be visited as a direct child of subformA.
*This implementation is basically a copy of the pre-existing `builtInCMapCache` implementation.*
For some, badly generated, PDF documents it's possible that we'll end up having to fetch the *same* standard font data over and over (which is obviously inefficient).
While not common, it's certainly possible that a PDF document uses *custom* font names where the actual font then references one of the standard fonts; see e.g. issue 11399 for one such example.
Note that I did suggest adding worker-thread caching of standard font data in PR 12726, however it wasn't deemed necessary at the time. Now that we have a real-world example that benefit from caching, I think that we should simply implement this now.
- Some js files contain scale factors for each glyph in order to rescale Liberation to have a final font with the correct width.
- A lot of XFA have some containers where their dimensions are based on their text content, so using default font from browser can lead to an almost unreadable pdf.
This patch uses the new option added in PR 12726 to *also* allow fetching binary CMap data directly in the worker-thread in browsers.
Given that these changes remove the need to transfer data between threads for the default (browser) use-case, we can also revert the changes in PR 11118 since that simplifies the overall implementation.
- some elements weren't displayed because their rotation angle was not taken into account;
- fix box model (XFA concept):
- remove use of outline;
- position correctly border which isn't part of box dimensions;
- fix margins issues (see issue #13474).
- move border on button instead of having it on wrapping div;
For Type3 fonts where the /CharProcs-streams of the individual glyph starts with a `d1` operator, we can use that to build a fallback bounding box for the font and thus improve text-selection in some cases.
- attribute 'use' was already implemented but not usehref
- in general, usehref should make reference to current document
- add support for SOM expressions in use and usehref to search a node.
- get prototype for all nodes if any.
Given that the same `PartialEvaluator`-instance is used for a lot of these unit-tests, manually changing the options in any one test-case could lead to intermittently failing unit-tests since they're run in a random order.
To fix this, we simply have to use the existing method to clone the `PartialEvaluator`-instance but with the custom options.
For HighlightAnnotations with a built-in appearance stream, we still rely on it to specify the opacity correctly via a suitable blend mode. However, if the Annotation-drawing operators are placed *within* a /XObject of the /Form-type, the /ExtGState won't apply to the final rendering and the result is that the highlighting obscures the underlying text.
The more *correct* and general solution would likely be to somehow modify the implementation in `src/display/canvas.js`, to special-case handling of /Form-type /XObjects when rendering Annotations. Since we can very easily work-around this problem for now by using the "no appearance stream" code-path, doing *something* here ought to be preferable.
This patch is (obviously) merely a work-around, but given that the referenced issue is (as far as I know) the first case we've seen of this problem a simple solution will hopefully suffice for now.
This fixes the colours, by respecting the strokeAlpha/fillAlpha-values, for a couple of Annotations in the PDF document from issue 13447.[1]
---
[1] Some of the annotations still won't render at all, when compared with Adobe Reader, but that could/should probably be handled separately.
- the only goal of this patch is to be able to get synchronously the fake html when printing from firefox:
- in order to print we need to inject some html in beforeprint callback but we cannot block in waiting for all the pages.
- from a memory point of view: it doesn't change anything since the fake HTML is deleted in the worker;
- this way we don't break any assumptions.
- I thought it was possible to rely on browser layout engine to handle layout stuff but it isn't possible
- mainly because when a contentArea overflows, we must continue to layout in the next contentArea
- when no more contentArea is available then we must go to the next page...
- we must handle breakBefore and breakAfter which allows to "break" the layout to go to the next container
- Sometimes some containers don't provide their dimensions so we must compute them in order to know where to put
them in their parents but to compute those dimensions we need to layout the container itself...
- See top of file layout.js for more explanations about layout.
- fix few bugs in other places I met during my work on layout.
This code is old and predates the improvements we made to the test
manifest to only contain working URLs (either Web Archive or
GitHub/Bugzilla links), so the fallback logic to try the Web Archive is
no longer necessary. This greatly simplifies the function and also
makes sure that we fail directly in case a bad URL is added to the
manifest, instead of having it work "accidentally" because of this
logic, since we want the manifest to be correct at all times (and
otherwise fail loudly).
According to the specification, see https://web.archive.org/web/20210404042322if_/https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G6.2384179, the keys of a NameTree/NumberTree should be ordered.
For corrupt PDF files, which violate this assumption, it's thus possible that trying to lookup a single entry fails.
Previously, in PR 10274, we implemented a fallback that only applies to the "bottom" node of a NameTree/NumberTree, which in general might not actually help for sufficiently corrupt NameTree/NumberTree data.
Instead we remove the current *limited* fallback from `NameOrNumberTree.get`, and defer to the call-site to handle this case explicitly e.g. by using `NameOrNumberTree.getAll` for data where that makes sense. For well-formed documents, these changes should *not* lead to any additional data fetching/parsing.
Finally, as part of these changes, the validation of named destination data is improved in the `Catalog` and a new unit-test is also added.
To get the maximum benefit from something like Prettier, you obviously don't want to disable the automatic formatting unless absolutely necessary. When we added Prettier there were a number of cases, mostly involving larger Arrays, which required disabling of the automatic formatting for overall readability and/or to not break inline comments.
With changes in Prettier version `2.3.0`, see [the release notes](https://prettier.io/blog/2021/05/09/2.3.0.html#concise-formatting-of-number-only-arrays-10106httpsgithubcomprettierprettierpull10106-10160httpsgithubcomprettierprettierpull10160-by-thorn0httpsgithubcomthorn0), there's now better formatting support for Arrays containing only numbers. Hence we can now remove a number of `// prettier-ignore` comments, and thus get the benefit of automatic formatting in (slightly) more of the code-base.
- Right now, a glyph with an erroneous outline is replaced by an empty glyph
if the error is far enough from the start there's likely something to render
so the idea is to replace a command with args by an endchar when no args are
on the stack: this way OTS is likely happy (no remaining args on stack) and we
can draw something which is likely better than nothing.
This patch was tested using the PDF file from issue 2618, i.e. https://bug570667.bugzilla-attachments.gnome.org/attachment.cgi?id=226471, with the following manifest file:
```
[
{ "id": "issue2618",
"file": "../web/pdfs/issue2618.pdf",
"md5": "",
"rounds": 50,
"type": "eq"
}
]
```
which gave the following results when comparing this patch against the `master` branch:
```
-- Grouped By browser, stat --
browser | stat | Count | Baseline(ms) | Current(ms) | +/- | % | Result(P<.05)
------- | ------------ | ----- | ------------ | ----------- | --- | ---- | -------------
firefox | Overall | 50 | 3417 | 3426 | 9 | 0.27 |
firefox | Page Request | 50 | 1 | 1 | 0 | 5.41 |
firefox | Rendering | 50 | 3416 | 3426 | 9 | 0.27 |
```
Based on these results, there's no significant performance regression from using standard classes and this patch should thus be OK.
Previously, we set the base transformation and pattern matrix
directly to the main rendering ctx of the page, however doing this
caused the current transform to be lost. This would cause issues
with things like shear missing so the pattern was misaligned or when
stroke was used the scale of the line width or dash would be wrong.
Instead we should leave the current transform and use setTransfrom
on the pattern so it is applied correctly. For axial and radial shadings I had
to create a temporary canvas to draw the shading so I could in turn
use setTransform.
Fixes: #13325, #6769, #7847, #11018, #11597, #11473
The following already in the corpus are improved:
issue8078-page1
issue1877-page1
Without this some *composite* fonts may incorrectly end up with matching `hash`es, thus breaking rendering since we'll not actually try to load/parse some of the fonts.
*Please note:* Given that the document, in the referenced issue, doesn't embed *any* of its fonts there's no guarantee that it renders correctly in all configurations even with this patch.
- app.alert and few other function can use an object as parameter ({cMsg: ...});
- support app.alert with a question and a yes/no answer;
- update field siblings when one is changed in an action;
- stop calculation if calculate is set to false in the middle of calculations;
- get a boolean for checkboxes when they've been set through annotationStorage instead of a string.
- `FontFlags`, is used in both `src/core/fonts.js` and `src/core/evaluator.js`.
- `getFontType`, same as the above.
- `MacStandardGlyphOrdering`, is a fairly large data-structure and `src/core/fonts.js` is already a *very* large file.
- `recoverGlyphName`, a dependency of `type1FontGlyphMapping`; please see below.
- `SEAC_ANALYSIS_ENABLED`, is used by both `Type1Font`, `CFFFont`, and unit-tests; please see below.
- `type1FontGlyphMapping`, is used by both `Type1Font` and `CFFFont` which a later patch will move to their own files.
- Improve chunking in order to fix some bugs where the spaces aren't here:
* track the last position where a glyph has been drawn;
* when a new glyph (first glyph in a chunk) is added then compare its position with the last saved one and add a space or break:
- there are multiple ways to move the glyphs and to avoid to have to deal with all the different possibilities it's a way easier to just compare positions;
- and so there is now one function (i.e. "compareWithLastPosition") where all the job is done.
- Add some breaks in order to get lines;
- Remove the multiple whites spaces:
* some spaces were filled with several whites spaces and so it makes harder to find some sequences of words using the search tool;
* other pdf readers replace spaces by one white space.
Update src/core/evaluator.js
Co-authored-by: Jonas Jenwald <jonas.jenwald@gmail.com>
Co-authored-by: Jonas Jenwald <jonas.jenwald@gmail.com>
First of all, while it should be very unlikely that the /ID-entry is an *indirect* object, note how we're using `Dict.get` when parsing it e.g. in `PDFDocument.fingerprint`. Hence we definitely should be consistent here, since if the /ID-entry is an *indirect* object the existing code in `src/core/writer.js` would already fail.
Secondly, to fix the referenced issue, we also need to check that the /ID-entry actually is an Array before attempting to access its contents in `src/core/writer.js`.
*Drive-by change:* In the `xrefInfo` object passed to the `incrementalUpdate` function, re-name the `encrypt` property to `encryptRef` since its data is fetched using `Dict.getRaw` (given the names of the other properties fetched similarly).
By replacing `XMLHttpRequest` with a `fetch` call, the helper function can be modernized to use async/await instead.
Note that the headers doesn't seem necessary to set now, since:
- The Fetch API provides a method for accessing the response as *text*, which renders the "Content-type" header unnecessary.
- According to https://developer.mozilla.org/en-US/docs/Glossary/Forbidden_header_name, the "Content-length" header isn't necessary.
- Use `PDFManager.ensureDoc`, rather than `PDFManager.ensure`, in a couple of spots in the code. If there exists a short-hand format, we should obviously use it whenever possible.
- Fix a unit-test helper, to account for the previous changes. (Also, converts a function to be `async` instead.)
- Add one more exists-check in `PDFDocument.loadXfaFonts`, which I missed to suggest in PR 13146, to prevent any possible errors if the method is ever called in a situation where it shouldn't be.
Also, print a warning if the actual font-loading fails since that could help future debugging. (Finally, reduce overall indentation in the loop.)
- Slightly unrelated, but make a small tweak of a comment in `src/core/fonts.js` to reduce possible confusion.
While I wasn't able to figure out *exactly* why the old format didn't work, re-factoring the `parseOptions` function to use `yargs` differently "just worked" so that's hopefully good enough here.
With these changes everything related to a *particular* option now appears in one place, rather than being spread out, which aids readability in my opinion. Also, the options are now sorted alphabetically, to make it easier to find a particular one.
https://www.npmjs.com/package/yargs
- Different fonts can be used in xfa and some of them are embedded in the pdf.
- Load all the fonts in window.document.
Update src/core/document.js
Co-authored-by: Jonas Jenwald <jonas.jenwald@gmail.com>
Update src/core/worker.js
Co-authored-by: Jonas Jenwald <jonas.jenwald@gmail.com>
There is no asynchronous code involved here, so we can get rid of all
done callbacks here and simply use the fact that if the function call
ends without failed assertion that the test passed.
There is no asynchronous code involved here, so we can get rid of all
done callbacks here and simply use the fact that if the function call
ends without failed assertion that the test passed.
With the current implementation of `PDFDocument.hasJSActions`, in the worker-thread, we're not actually handling not-yet-loaded data correctly. This can thus fail in *two* different ways:
- The `PDFDocument.fieldObjects` getter (and its helper method), while it may *return* a Promise, still fetches all of its data synchronously and it can thus throw a `MissingDataException` during parsing.
- The `Catalog.jsActions` getter, which is completely synchronous, can obviously throw a `MissingDataException` during parsing.
If either of these cases occur currently, the `PDFDocumentProxy.hasJSActions` method in the API can either return a *rejected* Promise (which it never should) or possibly "hang" and never resolve.
*Please note:* While I've not *yet* seen this error in an actual PDF document, it can happen during loading if you're unlucky enough with e.g. the structure of the PDF document and/or the download speed offered by the server.
This patch is thus based on code-inspection *and* on manually throwing a `MissingDataException` on the first access of `Catalog.jsActions` to simulate this situation.
Finally, this patch adds a couple of *API* unit-tests for this (since none existed).
This is first of all consistent with existing API-methods, where we return `null` when the data in question doesn't exist. Secondly, it should also be (slightly) more efficient since there's less dummy-data that we need to transfer between threads.
Finally, this prevents us from adding an empty/unnecessary span to *every* single page even in documents without any structure tree data.
The done callbacks are an outdated mechanism to signal Jasmine that a
unit test is done, mostly in cases where a unit test needed to wait for
an asynchronous operation to complete before doing its assertions.
Nowadays a much better mechanism is in place for that, namely simply
passing an asynchronous function to Jasmine, so we don't need callbacks
anymore (which require more code and may be more difficult to reason
about).
In these particular cases though the done callbacks never had any real
use since nothing asynchronous happens in these places. Synchronous
functions don't need to use done callbacks since Jasmine simply knows
it's done when the function reaches its normal end, so we can safely get
rid of these callbacks. The telltale sign is if the done callback is
used unconditionally at the end of the function.
This is all done in an effort to over time get rid of all callbacks in
the unit test code.
- but don't validate them for now;
- Firefox will display a bar to warn that the signature validation is not supported (see https://bugzilla.mozilla.org/show_bug.cgi?id=854315)
- almost all (all ?) pdf readers display signatures;
- validation is done in edge but for now it's behind a pref.
When a PDF is "marked" we now generate a separate DOM that represents
the structure tree from the PDF. This DOM is inserted into the <canvas>
element and allows screen readers to walk the tree and have more
information about headings, images, links, etc. To link the structure
tree DOM (which is empty) to the text layer aria-owns is used. This
required modifying the text layer creation so that marked items are
now tracked.
Note how we purposely don't expose the `AnnotationStorage`-class directly in the official API (see `src/pdf.js`), since trying to use *multiple* ones simultaneously doesn't really make sense (e.g. in the viewer).
Instead we lazily initialize, and cache, just *one* instance via `PDFDocumentProxy.annotationStorage` which should thus be available internally in the API itself without having to be manually passed to various methods.
To support these changes, the `AnnotationStorage`-instance initialization is moved into the `WorkerTransport`-class to allow both `PDFDocumentProxy` and `PDFPageProxy` to access it.
This patch implements the following simplifications:
- Remove the `annotationStorage`-parameter from `PDFDocumentProxy.saveDocument`, since it's already available internally.
Furthermore, while it's currently possible to call that method without an `AnnotationStorage`-instance, that really does *not* make any sense at all. In this case you're effectively reducing `PDFDocumentProxy.saveDocument` to a "regular" `PDFDocumentProxy.getData` call, but with *a lot* more overhead, which was obviously not the intention of the `PDFDocumentProxy.saveDocument`-method.
- Try to discourage third-party users from calling `PDFDocumentProxy.saveDocument` unconditionally, as a replacement for `PDFDocumentProxy.getData` (note the previous point).
- Replace the `annotationStorage`-parameter, in `PDFPageProxy.render`, with a boolean `includeAnnotationStorage`-parameter which simply indicates if the (internally available) `AnnotationStorage`-instance should be used during rendering (e.g. for printing).
- By removing the need to *manually* provide `annotationStorage`-parameters to various API-methods, using the API should become simpler (e.g. for third-parties) since you no longer need to worry about manually fetching and passing around this data.
The fontName, as defined in the PDF document, cannot be found in *any* of the "name"-tables in the TrueType Collection font. To work-around that, this patch adds a *fallback* code-path to allow using an approximately matching fontName rather than outright failing.
Currently the `fontName`-property contains an actual /Name-instance, which is a problem given that its fallback value is an empty string; see ca7f546828/src/core/default_appearance.js (L35)
The reason that this is a problem can be seen in ca7f546828/src/core/primitives.js (L30-L34), since an empty string short-circuits the cache. Essentially, in PDF documents, a /Name-instance cannot be empty and the way that the `DefaultAppearanceEvaluator` does things is unfortunately not entirely correct.
Hence the `fontName`-property is changed to instead contain a string, rather than a /Name-instance, which simplifies the code overall.
*Please note:* I'm tagging this patch with "[api-minor]", since PR 12831 is included in the current pre-release (although we're not using the `fontName`-property in the display-layer).
Fixes#13107
In the issue, some TrueType glyph names have the format `uniXXXX`.
Font's `Encoding` dictionary has the entry `Differences` but no
`BaseEncoding`. `uniXXXX` names are converted to glyph indices
using font's `post` table but currently that is done only when
`BaseEncoding` exists. We must enable the conversion also when only
`Differences` exists.
Currently only URL-strings are officially supported by `getDocument`, however at this point in time I cannot really see any compelling reason to not support `URL`-objects as well.
Most likely the reason that we've don't already support `URL`-objects, in `getDocument`, is that historically `URL` wasn't fully implemented across browsers and our old polyfill wasn't perfect; see https://developer.mozilla.org/en-US/docs/Web/API/URL/URL#browser_compatibility
*Please note:* Because of how the `url` parameter is currently handled, there's actually *some* cases where passing a `URL`-object to `getDocument` already works. That, in my opinion, provides additional motivation for supporting `URL`-objects officially, since it makes the API more consistent.
The following is an attempt to summarize the *current* situation, based on the actual code rather than the JSDocs:
- `getDocument("url string")` works and is documented.[1]
- `getDocument({ url: "url string", })` works and is documented.[1]
- `getDocument(new URL(...))` throws immediately, since no supported parameters are found.
- `getDocument({ url: new URL(...), })` actually works even though it's not documented.[1] Originally, when data was fetched on the worker-thread, this would likely have thrown since `URL` isn't clonable.[2]
- `getDocument({ url: { abc: 123, }, })`, or some similarily meaningless input, will be "accepted" by `getDocument` and then throw a `MissingPDFException` when attempting to fetch the bogus data.
With the changes in this patch, not only is `URL`-objects now officially supported and documented when calling `getDocument`, but we'll also do a much better job at actually validating any URL-data passed to `getDocument` (and instead fail early).
---
[1] In *browsers*, we create a valid URL thus indirectly validating the input. In Node.js environments, on the other hand, no validation is done since obtaining a baseUrl is more difficult (and PDF.js is primarily written for browsers anyway).
[2] https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers_API/Structured_clone_algorithm#supported_types
* JS - Handle correctly hierarchy of fields
- it aims to fix#13132;
- annotations can inherit their actions from the parent field;
- there are some fields which act as a container for other fields:
- they can be access through js so need to add them with an empty type (nothing in the spec about that but checked in Acrobat);
- calculation order list (CO) can reference them so need make them through this.getField;
- getArray method must return kids.
- field values are number, string, ... depending of their type but nothing in the spec on how to know what's the type:
- according to the comment for Canonical Format: https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf#page=461
- it seems that this "type" can be guessed from js action Format (when setting a type in Acrobat DC, the only affected thing is this action).
- util.scand with an empty string returns the current date.
- Actually support *linked* test-cases in the integration-tests (in the same way as the unit-tests).
- Add a new `"type": "other"`-kind to the test-manifest, to support *linked* test-cases in the unit/integration-tests without requiring the PDF document in question to also be a reference-test.
- implement few positioning properties: position, width, height, anchor;
- implement font element;
- implement fill element (used by font) and its children (linear, radial, ...);
- font property is inherited from ancestor container (see https://www.pdfa.org/wp-content/uploads/2020/07/XFA-3_3.pdf#page=43) so let CSS handles that stuff;
- in order to reduce the number of properties to set, only set non default properties and put the default in CSS;
- set a background to some containers to be able to see them (will be removed in a future commit).
- add an option to enable XFA rendering if any;
- for now, let the canvas layer: it could be useful to implement XFAF forms (embedded pdf in xml stream for the background and xfa form for the foreground);
- ui elements in template DOM are pretty close to their html counterpart so we generate a fake html DOM from template one:
- it makes easier to translate template properties to html ones;
- it makes faster the creation of the html element in the main thread.
It seems reasonable to place this alongside the *similar* `getFilenameFromUrl` helper function. This way, with the changes in the next patch, we also avoid having to expose the `isDataScheme` function in the API itself and we instead expose `getPdfFilenameFromUrl` in the API (which feels overall more appropriate).
This extends PR 13033 slightly, with a heuristic to support corrupt PDF documents where the `LineAnnotation`s have an empty /Rect-entry. Please note that while I have no idea if this is "correct", this patch at least makes us output the same /BBox as re-saving in Adobe Reader does.
A significant portion of the code-base has now been converted to use `let`/`const`, rather than `var`, hence it should be possible to simply enable the ESLint `no-var` rule globally.
This way we can ensure that new code won't accidentally use `var`, and it also removes the need to manually enable the rule in various folders.
Obviously it makes sense to continue the efforts to replace `var`, but that should probably happen on a file and/or folder basis.
Please note that this patch excludes the following code:
- The `extensions/` folder, since that seemed easiest for now (and I don't know exactly what the support situation is for the Chromium-extension).
- The entire `external/` folder is ignored, since most of it's currently excluded from linting.
For the code that isn't imported from elsewhere (and should be ignored), we should probably (at some point) bring the code up to the same linting/formatting standard as the rest of the code-base.
- Various files in the `test/` folder are ignored, as necessary, since the way that a lot of this code is loaded will require some care (or perhaps larger re-factoring) when removing `var` usage.
Currently errors occurring within the `src/display/{text_layer, annotation_layer}.js` files are not being handled properly by the test-suite, and the tests simply time out rather than failing as intended.
This makes it *very* easy to accidentally overlook a certain type of errors, see e.g. https://github.com/mozilla/pdf.js/pull/13055#discussion_r589005041, which this patch will thus prevent.
These tests, and their [accompanying Wiki page](https://github.com/mozilla/pdf.js/wiki/Required-Browser-Features), haven't received any real updates for *many years* and are sufficiently out of date to be effectively useless now.
Providing *irrelevant* compatibility information seems overall worse than not providing any information, and as suggested in the issue it'd probably be better to use https://github.com/mozilla/pdf.js#online-demo for checking if a particular platform/browser is supported.
Thanks to version control, it's easy to restore these files should the need ever arise. However, re-introducing these tests would essentially require updating every single test-case *and* a commitment to keeping them up to date with future code changes.
- strokeColor corresponds to borderColor;
- support fillColor and textColor;
- support colors on the different annotations;
- fix typo in aforms (+test).
Rather than converting the `AnnotationStorage`-data to an Object, before sending it to the worker-thread, we should be able to simply send the internal `Map` directly.
The "structured clone algorithm" doesn't have a problem with `Map`s, however the `LoopbackPort` used when workers are *disabled* (e.g. in Node.js environments) didn't use to support them. With PR 12997 having lifted that restriction, we should now be able to simply send the `AnnotationStorage`-data as-is rather than having to iterate through it to first create an Object.
*Please note:* The changes in `src/core/annotation.js` could have been a lot more compact if we were able to use optional chaining in the `src/core` folder. Unfortunately that's still not possible, since SystemJS is being used in the development viewer (i.g. `gulp server`) and fixing that is *still* blocked by [bug 1247687](https://bugzilla.mozilla.org/show_bug.cgi?id=1247687).
With the previous patch this function is now *only* accessed on the worker-thread, hence it's no longer necessary to include it in the *built* `pdf.js` file.
With the previous patch this functionality is now *only* accessed on the worker-thread, hence it's no longer necessary to include it in the *built* `pdf.js` file.
The only reason, as far as I can tell, for parsing the Metadata on the main-thread is how it was originally implemented. When Metadata support was first implemented, it utilized the [`DOMParser`](https://developer.mozilla.org/en-US/docs/Web/API/DOMParser) which isn't available in workers.
Today, with the custom XML-parser being used, that's no longer an issue and it seems reasonable to move the Metadata parsing to the worker-thread[1], since that's where all parsing should happen (for performance reasons).
Based on these changes, we'll be able to reduce the now unnecessary duplication of the XML-parser (and related code) in both of the *built* `pdf.js`/`pdf.worker.js` files.
Finally, this patch changes the `_repair` method to use "Array + join" rather than string concatenation.
---
[1] This needed the previous patch, to enable sending of `Map`s between threads with workers disabled.
* don't set a value in annotationStorage by default:
- having an undefined when the annotation is rendered for saving/printing means nothing has changed so use normal appearance
- aims to fix https://bugzilla.mozilla.org/show_bug.cgi?id=1681687
* change the way to compute font size when this one is null in DA:
- make fontSize proportional to line height
- in multiline case, take into account the number of lines for text entered to adapt the font size
*As far as I can tell, this has been broken ever since PR 3289 (back in 2013) without anyone noticing.*
For any non-`MissingDataException` errors encountered in `ObjectLoader._walk`, we're simply throwing immediately which thus has the potential to *completely* break rendering of an entire page.
In practice this is obviously only an issue for PDF documents which are in one way or another corrupt, since that's the only way that `XRef.fetch` will throw non-`MissingDataException` errors. To make matters worse these errors are *intermittent*, since they can only occur if the document is still loading when the `ObjectLoader`-code runs (note the early return in `ObjectLoader.load`).
Please note that we cannot simply catch the error and let "normal" parsing continue in `ObjectLoader._walk`, since that could lead to errors elsewhere given that resources "below" the current one (in the graph) might not be checked as intended then.
All-in-all, the only way to make absolutely sure that we won't cause *unexpected* `MissingDataException`s somewhere else in the code-base is to fallback to fetching the *entire* document in this edge-case.
- the parser is base on a class extending XMLParserBase
- it handle xml namespaces:
* each namespace is assocated with a builder
* builder builds nodes belonging to the namespace
* when a node is inserted in the parent namespace compatibility is checked (if required)
- to avoid name collision between xml names and object properties, use Symbol.
Note how, in the `if (this.stateManager.stateStack.length !== 0) {` branch, we're attempting to access the not yet defined variable[1] `args`. If this code-path is ever hit, an Error will be thrown and parsing will thus be aborted immediately (likely leading to e.g. rendering bugs).
Note that I found this purely by accident, since I happened to glance at the LGTM report. However, I've since found that the error is also present during the unit-test[2] and with this patch we're actually testing the *intended* thing here.
As part of fixing this, and to avoid re-introducing a similar bug in the future, we'll now instead always reset `args.length` *before* attempting to read the next operator.
Also, we can use the existing `EvaluatorPreprocessor.savedStatesDepth` getter to simplify the save/restore detection a tiny bit.
---
[1] The ESLint rule `no-use-before-define` would have helped catch this problem, but unfortunately we cannot enable that without quite a bit of refactoring all over the code-base.
[2] The unit-test was updated such that it would fail in the `master`-branch.
- For wrapped scrolling, we unfortunately need to do a fair bit of parsing of the *current* page layout. Compared to e.g. the spread-modes, where we can easily tell how the pages are laid out, with wrapped scrolling we cannot tell without actually checking. In particular documents with varying page sizes require some care, since we need to check all pages on the "row" of the current page are visible and that there aren't any "holes" present. Otherwise, in the general case, there's a risk that we'd skip over pages if we'd simply always advance to the previous/next "row" in wrapped scrolling.
- For horizontal scrolling, this patch simply maintains the current behaviour of advancing *one* page at a time. The reason for this is to prevent inconsistent behaviour for the next and previous cases, since those cannot be handled identically. For the next-case, it'd obviously be simple to advance to the first not completely visible page. However for the previous-case, we'd only be able to go back *one* page since it's not possible to (easily) determine the page layout of non-visible pages (documents with varying page sizes being a particular issue).
- For vertical scrolling, this patch maintains the current behaviour by default. When spread-modes are being used, we'll now attempt to advance to the next *spread*, rather than just the next page, whenever possible. To prevent skipping over a page, this two-page advance will only apply when both pages of the current spread are visible (to avoid breaking documents with varying page sizes) and when the second page in the current spread is fully visible *horizontally* (to handle larger zoom values).
In order to reduce the performance impact of these changes, note that the previous/next-functionality will only call `getVisibleElements` for the scroll/spread-modes where that's necessary and that "normal" vertical scrolling is thus unaffected by these changes.
To support these changes, the `getVisibleElements` helper function will now also include the `widthPercent` in addition to the existing `percent` property.
The `PDFViewer._updateHelper` method is changed slightly w.r.t. updating the `currentPageNumber` for the non-vertical/spread modes, i.e. won't affect "normal" vertical scrolling, since that helped simplify the overall calculation of the page advance.
Finally, these new `BaseViewer` methods also allow (some) simplification of previous/next-page functionality in various viewer components.
*Please note:* There's one thing that this patch does not attempt to change, namely disabling of the previous/next toolbarButtons respectively the firstPage/lastPage secondaryToolbarButtons. The reason for this is that doing so would add quite a bit of complexity in general, and if for some reason `BaseViewer._getPageAdvance` would get things wrong we could end up incorrectly disabling the buttons. Hence it seemed overall safer to *not* touch this, and accept that the buttons won't be `disabled` despite in some edge-cases no further scrolling being possible.
* Add a parser to get font data from the default appearance
- pdfium & poppler use a special parser too to get these info.
* Update src/core/default_appearance.js
Co-authored-by: Jonas Jenwald <jonas.jenwald@gmail.com>
Co-authored-by: Jonas Jenwald <jonas.jenwald@gmail.com>
- Remove a *duplicated* reference test, see "issue12810", from the manifest.
- Improve the spelling in a couple of comments in `src/core/canvas.js`, most notable of the word "parallelogram".
- Update a comment, also in `src/core/canvas.js`, to actually agree with the value used to reduce confusion when reading the code.
While PR 12725 fixed bug 1671312 as reported, i.e. the "In the upper right corner "Purposes' has bad kerning."-part, it however broke other parts of the text rendering.
Note in particular the tables, e.g. on page 2 and beyond, where the glyphs are now rendered too close together. The reason for this is that the fonts in question are non-embedded ArialNarrow, which we just replace with Helvetica which obviously is not narrow. Given that the font replacement isn't a perfect fit for non-embedded ArialNarrow, we still need to re-measure the glyph widths in this case.
This patch is a rebased *and* refactored version of PR 9448, such that it applies cleanly given that `PDFFindController` has changed since that PR was opened; obviously keeping the original author information intact.
This patch will thus ensure that e.g. fractions, and other things that we normalize before searching, will still be highlighted correctly in the textLayer.
Furthermore, this patch also adds basic unit-tests for this functionality.
*Note:* The `[api-minor]` tag is added, since third-party implementations of the `PDFFindController` must now always use the `pageMatchesLength` property to get accurate length information (see the `web/text_layer_builder.js` changes).
Co-authored-by: Ross Johnson <ross@mazira.com>
Co-authored-by: Jonas Jenwald <jonas.jenwald@gmail.com>
* add a comment to explain how minimal linewidth is computed.
* when context.linewidth < 1 after transform, firefox and chrome
don't render in the same way (issue #12810).
* set lineWidth to 1 after transform and before stroking
- aims fix issue #12295
- a pixel can be transformed into a rectangle with both heights < 1.
A single rescale leads to a rectangle with dim equals to 1 and
the other to something greater than 1.
* change the way to render rectangle with null dimensions:
- right now we rely on the lineWidth set before "re" but
it can be set after "re" and before "S" and in this case the rendering
will be wrong.
- render such rectangles as a single line.
Note that these changes were done automatically, using `gulp lint --fix`.
With this rule, we'll thus enforce a *consistent* formatting of zero-lengths in our CSS files.
Please find additional details about the Stylelint rule at https://stylelint.io/user-guide/rules/length-zero-no-unit
There's built-in ESLint rule, see `sort-imports`, to ensure that all `import`-statements are sorted alphabetically, since that often helps with readability.
Unfortunately there's no corresponding rule to sort `export`-statements alphabetically, however there's an ESLint plugin which does this; please see https://www.npmjs.com/package/eslint-plugin-sort-exports
The only downside here is that it's not automatically fixable, but the re-ordering is a one-time "cost" and the plugin will help maintain a *consistent* ordering of `export`-statements in the future.
*Note:* To reduce the possibility of introducing any errors here, the re-ordering was done by simply selecting the relevant lines and then using the built-in sort-functionality of my editor.
Given that the PDF document in the issue contains the same very large JPEG image *three* times, this patch includes a test-case where only the first page has been extracted from it.
Given that the API will now, after PR 12039, automatically pick the correct factories to use depending on the environment (browser vs. Node.js), we can utilize that in the unit-tests as well. This way we don't have to manually repeat the same initialization code in *multiple* unit-tests.
*Note:* The *official* PDF.js API is defined in `src/pdf.js`, hence the new exports in `src/display/api.js` will not affect that.
Also, updates the unit-test `FileReaderFactory` helpers similarily.
*Drive-by change:* Fix the `CMapReaderFactory` usage in the annotation unit-tests, since the cache should only contain raw data and not a Promise. While this obviously works as-is, having unit-tests that "abuse" the intended data format can easily lead to unnecessary failures if changes are made to the relevant `src/core/` code.
Currently any errors thrown in `preEvaluateFont`, which is a *synchronous* method, will not be handled at all in the `loadFont` method and we were thus failing to return an `ErrorFont`-instance as intended here.
Also, add an *explicit* check in `PartialEvaluator.preEvaluateFont` to ensure that Type0-fonts always have a *valid* dictionary.