Normals and the Inverse Transpose, Part 2: Dual Spaces

May 19, 2018 · Graphics, Math · Comments

In the first part of this series, we learned about Grassmann algebra, and concluded that normal vectors in 3D can be interpreted as bivectors. To transform bivectors, we need to use a different matrix (in general) than the one that transforms ordinary vectors. Using a canonical basis for bivectors, we found that the matrix required is the cofactor matrix, which is proportional to the inverse transpose. This provides at least a partial explanation of why the inverse transpose is used to transform normal vectors.

However, we also left a few loose ends untied. We found out about the cofactor matrix, but we didn’t really see how that connects to the algebraic derivation that transforming a plane equation $N \cdot x + d = 0$ involves the inverse transpose. I just sort of handwaved the proportionality between the two.

Moreover, we saw that Grassmann $k$-vectors provide vectorial geometric objects with a natural interpretation as carrying units of length, area, and volume, owing to their scaling behavior. But we didn’t find anything similar for densities—units of inverse length, area, or volume.

As we’ll see in this article, there’s one more geometric concept we need to complete the picture. Putting this new concept together with the Grassmann algebra we’ve already learned will turn out to clarify and resolve these remaining issues.

Without further ado, let’s dive in!

Functions As Vectors
Linear Forms and the Dual Space
- The Natural Pairing
- The Dual Basis
Transforming Dual Vectors
- Uniform Scaling
- Sheared Dual Vectors and the Inverse Transpose
So What’s a Normal Vector, Anyway?

Functions As Vectors

Most of this article will be concerned with functions taking and returning vectors of various kinds. To understand what follows, it’s necessary to make a bit of a mental flip, which you might find quite counterintuitive if you haven’t encountered it before.

The flip is this: functions that map into a vector space are themselves vectors.

That statement might not appear to make any sense at first! Vectors and functions are totally different kinds of things, right, like apples and…chairs? How can a function literally be a vector?

If you look up the technical definition of a vector space, you’ll find that it’s quite nonspecific about what the structure of a vector has to be. We often think of them as arrows with a magnitude and direction, or as ordered lists of numbers (coordinates). However, all you truly need for a vector space is a set of things that support two basic operations: being added together, and being multiplied by scalars (here, real numbers). These operations just need to obey a few reasonable axioms.

Well, functions can be added together! If we have two functions $f$ and $g$, we can add them pointwise to produce a new function $h$, defined by $h(x) = f(x) + g(x)$ for every point $x$ in the domain. Likewise, we can multiply a function pointwise by a scalar: $g(x) = a \cdot f(x)$. These operations do satisfy the vector space axioms, and therefore any set of compatible functions forms a vector space in its own right: a function space.

To put it a bit more formally: given a domain set $X$ (any kind of set, not necessarily a vector space itself) and a range vector space $V$, the set of functions $f: X \to V$ forms a vector space under pointwise addition and scalar multiplication. You need the range to be a vector space so you can add and multiply the outputs of the functions, but the domain isn’t required to be a vector space—or even a “space” per se at all; it could be a discrete set.

This realization that functions can be treated as vectors then lets us apply linear-algebra techniques to understand and work with functions—a large branch of mathematics called functional analysis.

Linear Forms and the Dual Space

From this point forward, we’ll be concerned with a specific class of functions known as linear forms.

If we have some vector space $V$ (such as 3D space $\Bbb R^3$, for instance), then a linear form on $V$ is defined as a linear function $f: V \to \Bbb R$. That is, it’s a linear function that takes a vector argument and returns a scalar.

(A note for the mathematicians: in this article I’m only talking about finite-dimensional vector spaces over $\Bbb R$, so I may occasionally make a statement that doesn’t hold for general vector spaces. Sorry!)

I like to visualize a linear form as a set of parallel, uniformly spaced planes (3D) or lines (2D): the level sets of the function at intervals of one unit in the output. Here are some examples:

A linear form, x + y A linear form, −⅓x + ½y A linear form, 2x + y

The gradients here indicate the linear form’s orientation—the function is increasing with the gradient’s opacity; the discrete lines mark where its output crosses an integer, and the opacity wraps around to zero. Note that “bigger” linear forms (in the sense of bigger output values) have more tightly-spaced lines, and vice versa.

As elaborated in the previous section, linear forms on a given vector space can themselves be treated as vectors, in their own function space. Linear combinations of linear functions are still linear, so they do form a closed vector space in their own right.

This vector space—the set of all linear forms on $V$—is important enough that it has its own name: the dual space of $V$. It’s denoted $V^*$. The elements of the dual space (the linear forms) are then called dual vectors, or sometimes covectors.

The Natural Pairing

The fact that dual vectors are linear functions, and not general functions from $V$ to $\Bbb R$, strongly restricts their behavior. Linear forms on an $n$-dimensional vector space have only $n$ degrees of freedom, versus the infinite degrees of freedom that a general function has. To put it another way, $V^*$ has the same dimensionality as $V$.

To see this more concretely: a linear form on $\Bbb R^n$ can be fully specified by the values it returns when you evaluate it on the $n$ vectors of a basis. The result it returns for any other vector can then be derived by linearity. For example, if $f$ is a linear form on $\Bbb R^3$, and $v = (x, y, z)$ is an arbitrary vector, then: $$ \begin{aligned} f(v) &= f(x {\bf e_x} + y {\bf e_y} + z {\bf e_z}) \\ &= x \, f({\bf e_x}) + y \, f({\bf e_y}) + z \, f({\bf e_z}) \end{aligned} $$ If you’re thinking that the above looks awfully like a dot product between $(x, y, z)$ and $\bigl(f({\bf e_x}), f({\bf e_y}), f({\bf e_z}) \bigr)$—you’re right!

Indeed, the operation of evaluating a linear form has the properties of a product between the dual space and the base vector space: $V^* \times V \to \Bbb R$. This product is called the natural pairing.

Like the vector dot product, the natural pairing results in a real number, and is bilinear—linear on both sides. However, here we’re taking a product not of two vectors, but of a dual vector with a “plain” vector. The linearity on the left side comes from pointwise adding/multiplying linear forms; that on the right comes from the linear forms being, well, linear in their vector argument.

Going forward, I’ll denote the natural pairing by angle brackets, like this: $\langle w, v \rangle$. Here $w$ is a dual vector in $V^*$, and $v$ is a vector in $V$. To reiterate, this is simply evaluating the linear form $w$, as a function, on the vector $v$. But because functions are vectors, and dual vectors in particular are linear functions, this operation also has the properties of a product.

The above equation looks like this in angle-bracket notation: $$ \begin{aligned} \langle w, v \rangle &= \bigl\langle w, \, x {\bf e_x} + y {\bf e_y} + z {\bf e_z} \bigr\rangle \\ &= x \langle w, {\bf e_x} \rangle + y \langle w, {\bf e_y} \rangle + z \langle w, {\bf e_z} \rangle \end{aligned} $$ Note how this now looks like “just” an application of the distributive property—which it is!

The Dual Basis

The above construction can also be used to define a canonical basis for $V^*$, for a given basis on $V$. Namely, we want to make the numbers $\langle w, {\bf e_x} \rangle, \langle w, {\bf e_y} \rangle, \langle w, {\bf e_z} \rangle$ be the coordinates of $w$ with respect to this basis, the same way that $x, y, z$ are coordinates with respect to $V$’s basis. We can do this by defining dual basis vectors ${\bf e_x^*}, {\bf e_y^*}, {\bf e_z^*}$, according to the following constraints: $$ \begin{aligned} \langle {\bf e_x^*}, {\bf e_x} \rangle &= 1 \\ \langle {\bf e_x^*}, {\bf e_y} \rangle &= 0 \\ \langle {\bf e_x^*}, {\bf e_z} \rangle &= 0 \end{aligned} $$ and similarly for ${\bf e_y^*}, {\bf e_z^*}$. The nine total constraints can be summarized as: $$ \langle {\bf e}_i^*, {\bf e}_j \rangle = \begin{cases} 1 & \text{if } i = j, \\ 0 & \text{if } i \neq j, \end{cases} \quad i, j \in \{ {\bf x, y, z} \} $$ This dual basis always exists and is unique, given a valid basis on $V$ to start from.

Geometrically speaking, the dual basis consists of linear forms that measure the distance along each axis—but the level sets of those linear forms are parallel to all the other axes. They’re not necessarily perpendicular to the same axis that they’re measuring, unless the basis happens to be orthonormal. This feature will be important a bit later!

By way of example, here are a couple of vector bases together with their corresponding dual bases:

An orthonormal basis and its corresponding dual basis An non-orthonormal basis and its corresponding dual basis

Here’s an example of a linear form decomposed into basis components, $w = p {\bf e_x^*} + q {\bf e_y^*}$:

A linear form as a sum of x and y basis components

With the dual basis defined as above, if we express both a dual vector $w$ and a vector $v$ in terms of their respective bases, then the natural pairing $\langle w, v \rangle$ boils down to just the dot product of the respective coordinates: $$ \begin{aligned} \langle w, v \rangle &= \bigl\langle p {\bf e_x^*} + q {\bf e_y^*} + r {\bf e_z^*}, \; x {\bf e_x} + y {\bf e_y} + z {\bf e_z} \bigr\rangle \\ &= px + qy + rz \end{aligned} $$

Transforming Dual Vectors

In the preceding article, we learned that although vectors and bivectors may appear structurally similar (they both have three components, in 3D space), they have different geometric meanings and different behavior when subject to transformations—in particular, to scaling.

With dual vectors, we have a third example in this class! Dual vectors are again “vectorial” objects (obeying the vector space axioms), again structurally similar to vectors and bivectors (having three components, in 3D space), but with a different geometric meaning (linear forms). This immediately suggests we look into dual vectors’ transformation behavior!

Dual vectors are linear forms, which are functions. So how do we transform a function?

The way I like to think about this is that the function’s output values are carried along with the points of its domain when they’re transformed. Imagine labeling every point in the domain with the function’s value at that point. Then apply the transformation to all the points; they move somewhere else, but carry their label along with them. (Another way of thinking about it is that you’re transforming the graph of the function, considered as a point-set in a one-higher-dimensional space.)

To formalize this a bit more: suppose we transform vectors by some matrix $M$, and we want to apply this transformation also to a function $f(v)$, yielding a new function $g(v)$. What we want is that $g$ evaluated on a transformed vector should equal $f$ evaluated on the original vector: $$ g(Mv) = f(v) $$ Or, equivalently, $$ g(v) = f(M^{-1}v) $$ In other words, we can apply a transformation to a function by making a new function that first applies the inverse transformation to its argument, then passes that to the old function.

Note that this only works if $M$ is invertible. If it isn’t, then our picture of “carrying the output values along with the domain points” falls apart: a noninvertible $M$ can collapse many distinct domain points into one, and then how could we decide what the function’s output should be at those points?

Uniform Scaling

Now that we understand how to apply a transformation to a function, let’s look at uniform scaling as an example. We’ll scale by a factor $a > 0$, so that vectors transform as $v \mapsto av$. Then functions will transform as $f(v) \mapsto f(v/a)$, per the previous section.

Let’s switch back to looking at this from a “dual vector” point of view instead of a “function” point of view. So, if $f(v) = \langle w, v \rangle$ for some dual vector $w$, then what happens when we scale by $a$? $$ \begin{aligned} \langle w, v \rangle \mapsto & \left\langle w, \frac{v}{a} \right\rangle \\ = & \left\langle \frac{w}{a}, v \right\rangle \end{aligned} $$ I’ve just moved the $1/a$ factor from one side of the angle brackets to the other, which is allowed because it’s a bilinear operation. To summarize, we’ve found that the dual vector $w$ transforms as: $$ w \mapsto \frac{w}{a} $$

Hmm, interesting! When we scale vectors by $a$, then dual vectors scale by $\bm{1/a}$. If you recall the previous article, we justified assigning units like “area” and “volume” to bivectors and trivectors on the basis of their scaling behavior. Following that line of reasoning, we can now conclude that dual vectors carry units of inverse length!

In fact, dual vectors represent oriented linear densities. They provide a quantitative way of talking about situations where some kind of scalar “stuff”—such as probability, texel count, opacity, a change in voltage/temperature/pressure, etc.—is spread out along one dimension in space. When you pair the dual vector with a vector (i.e. evaluate the linear form on a vector), you’re asking “how much of that ‘stuff’ does this vector span?”

Under a scaling, we want to preserve the amount of “stuff”. If we’re scaling up, then the density of “stuff” will need to go down, as the same amount of stuff is now spread over a longer distance; and vice versa. This property is implemented by the inverse scaling behavior of dual vectors.

Sheared Dual Vectors and the Inverse Transpose

We’ve seen how uniform scaling applies inversely to dual vectors. We could study nonuniform scaling now, too, but it turns out that axis-aligned nonuniform scaling isn’t that interesting—it just applies inversely to each axis, as you might expect. It’ll be more illuminating at this point to look at what happens with a shear.

I’ll stick to 2D for this one. As an example transformation, we’ll shear the $y$ axis toward $x$ a little bit: $$ M = \begin{bmatrix} 1 & \tfrac{1}{2} \\ 0 & 1 \end{bmatrix} $$ Here’s what it looks like:

The shear applied to a standard vector basis

When we perform this transformation on a dual vector, what happens? When you look at it visually, it’s pretty straightforward—the level sets (isolines) of the linear form will tilt to follow the shear.

Animation of a linear form shearing

But how do we express this as a matrix acting on the dual vector’s coordinates? Let’s focus on the $\bf e_x^*$ component. Note that our transformation $M$ doesn’t affect the $x$-axis—it maps $\bf e_x$ to itself. But what about $\bf e_x^*$?

Animation of eₓ* shearing

The $\bf e_x^*$ component of a dual vector does change under this transformation, because the isolines pick up the shear! Or, to put it another way: although distances along the $x$ axis (which $\bf e_x^*$ measures) don’t change here, $\bf e_x^*$ still cares about what the other axes are doing because it has to stay parallel to them. That’s one of the defining conditions for the dual basis to do its job.

In particular, we have that $\bf e_x^*$ maps to ${\bf e_x^*} - \tfrac{1}{2}{\bf e_y^*}$. If we work it out the rest of the way, the full matrix that applies to the coordinates of a dual vector is: $$ \begin{bmatrix} 1 & 0 \\ -\tfrac{1}{2} & 1 \end{bmatrix} $$ This is the inverse transpose of $M$!

We can loosely relate the effect of the inverse transpose here to that of the cofactor matrix for bivectors, as seen in the preceding article. Like a bivector, each dual basis element cares about what’s happening to the other axes (because it needs to keep parallel to them)—but it also must scale inversely along its own axis. The determinant of $M$ gives the cumulative scaling along all the axes: $$ \det M = \text{scaling on my axis} \cdot \text{scaling on other axes} $$ We can algebraically rearrange this to: $$ \frac{1}{\text{scaling on my axis}} = \frac{1}{\det M} \cdot \text{scaling on other axes} $$ This matches the relation between the inverse transpose and the cofactor matrix. $$ M^{-T} = \frac{1}{\det M} \cdot \text{cofactor}(M) $$ I’m handwaving a lot here—a detailed geometric demonstration would take us off into the weeds—but hopefully this gives at least a little bit of intuition for why the inverse transpose matrix is the right thing to use for dual vectors.

So What’s a Normal Vector, Anyway?

As we’ve seen, the level sets of a linear form are parallel lines in 2D, or planes in 3D. This implies that we can define a plane by picking out a specific level set of a given dual vector: $$ \langle w, v \rangle = d $$ The dual vector $w$ is acting as a signed distance field for the plane.

We’ve also seen that when expressed in terms of matched basis-and-dual-basis components, the natural pairing product $\langle w, v \rangle$ reduces to a dot product $w \cdot v$. And then the above equation looks like the familiar plane equation: $$ w \cdot v = d $$ This shows that the dual vector’s coordinates with respect to the dual basis are also the coordinates of a normal vector to the plane, in the standard vector basis.

So, normal vectors can be interpreted as dual vectors expressed in the dual basis, and that’s why they transform with the inverse transpose!

But wait—in the last article, didn’t I just say that normal vectors should be interpreted as bivectors, and therefore they transform with the cofactor matrix? Which one is it?

Ultimately, I don’t think there’s a definitive answer to this question! “Normal vector” as an idea is a bit too vague—bivectors and dual vectors are both defensible ways to formalize the “normal vector” concept. As we’ve seen, the way they transform is equivalent as far as orientation: bivectors and dual vectors both transform to stay perpendicular to the plane they define, by either $B \wedge v = d$ or $\langle w, v \rangle = d$, respectively. The difference between them is in the units they carry and their scaling behavior: bivectors are areas, while dual vectors are inverse lengths.

That’s all I have to say about transforming normal vectors! But we’ve got another question still dangling. At the end of Part 1, I asked about vectorial quantities with negative scaling powers. In dual vectors, we’ve now achieved scaling power −1. But what about −2 and −3? To find those, we’re going to have to combine dual spaces with Grassmann algebra. We’ll do that in the third and final part of this series.

Nathan Reed