Reading Veach’s Thesis, Part 2
In this post, we’re continuing to read Eric Veach’s doctoral thesis. In our last installment, we covered the first half of the thesis, dealing with theoretical foundations for Monte Carlo rendering. This time we’re tackling chapters 8–9, including one of the key algorithms this thesis is famous for: multiple importance sampling. Without further ado, let’s tuck in!
As before, this isn’t going to be a comprehensive review of everything in the thesis—it’s just a selection of things that made me go “oh, that’s cool”, or “huh! I didn’t know that”.
Path-Space Integrals
We usually see the rendering equation expressed as a fixed-point integral equation. The radiance field $L$ appears on both sides: $$ L = L_e + \int L \, f \, |\cos\theta| \, \mathrm{d}\omega $$ There are some theorems showing that we can solve this as an infinite series: $$ L = L_e + TL_e + T^2 L_e + \cdots $$ where $T$ is an operator representing the integral over surfaces with their BSDFs. This series constructs the solution bounce-by-bounce: first directly emitted light, then light that’s been scattered once, then scattered twice, and so on.
The trouble is, this series contains a separate integral for each possible path length. For the methods Veach is going to deploy later, he needs to be able to combine paths of all lengths in a single Monte Carlo estimator. In Chapter 8, he reformulates the rendering equation as an integral over a “space” of all possible paths: $$ L = \int L_e \, f \, \mathrm{d}\mu $$ The idea is that now we’re integrating a new kind of “variable”, which ranges over all paths (of any length) in the scene. Here, $f$ stands for the throughput along a whole path, and $L_e$ for the emitted light injected at its beginning.
By itself, this doesn’t really simplify anything; we’ve just moved the complexity from the rendering equation to the definition of the path space over which we’re integrating. This is a funny kind of “space” that actually consists of a disjoint union of an infinite sequence of subspaces, one for each possible path length. Those subspaces even have different dimensionalities, which is extra weird! But with Lebesgue measure theory, this is a legit space that can be integrated over in a mathematically rigorous way.
This sets us up for talking about probability distributions over all paths, combining different path sampling methods in an unbiased way, and so forth—which will be crucial in the following chapters.
The path-integral formulation of the rendering equation has also become quite popular in light transport theory papers today.
Non-Local Path Sampling
Veach gives an intriguing example of a potential new path sampling approach that’s facilitated by the path-integral formulation. Usually, paths are constructed incrementally starting from one end, by shooting a ray toward the next path vertex. But in the presence of specular surfaces such as a planar mirror, you could also algebraically solve for a point on the mirror that will connect two existing path vertices (say, one from a camera subpath and one from a light subpath). Even more exotically, we could consider solving for chains of multiple specular scattering events to connect a given pair of endpoints.
Veach calls this “non-local” path sampling, because it looks at vertices that aren’t just adjacent to each other on the path, but farther apart.
Veach merely sketches this idea and remarks that it could be useful. Since then, non-local sampling ideas have been researched in the manifold exploration family of techniques, such as Manifold Next-Event Estimation and Specular Manifold Sampling.
Extended Light Path Expressions
You may have seen “regular expression” syntax describing the vertices of paths, like $LS^*DE$ and suchlike. In this notation, $L$ stands for a light source, $S$ for a (Dirac) specular scattering event, $D$ a diffuse (or glossy) scattering event, and $E$ for the camera/eye. It’s a concise way to classify which kinds of paths are handled by different techniques. These “light path expressions” are widely used in the literature, as well as in production renderers to split off different lighting components into separate framebuffers.
Veach describes an extension to this notation in which extra $D$ and $S$ symbols are added to denote the continuity or discreteness of lights and cameras, in both position and directionality. For example, a point light (positionally “specular”) that radiates in all directions (“diffuse”) would be denoted $LSD$. A punctual directional light would be $LDS$, and an area light would be $LDD$. The camera is described likewise, but in the opposite order: $DSE$ is a pinhole camera, while $DDE$ is a camera with a physical lens area. These substrings are used as prefixes and suffixes for what he calls “full-path” regular expressions.
There’s a certain elegance to this idea, but I have to admit I found it confusing in practice, even after reading several chapters using these extended expressions. I had to keep looking up which symbol was the position and which was the direction, and stopping to think about what those labels mean in the context of a light source or camera.
This extended syntax doesn’t seem to have been adopted by much later literature, but I did see it used in the Path Space Regularization paper by Kaplanyan and Dachsbacher. They also print the light and camera substrings in different colors, to improve their readability.
Multiple Importance Sampling
Alright, now we’re getting into the real meat of Veach’s thesis! In a sense, all the foregoing material was just setup and preparation for the last three chapters, which contain the thesis’s major original contributions.
I’ll assume you’re familiar with the basic ideas of multiple importance sampling, the balance heuristic, and the power heuristic. If you need a refresher, here’s the relevant section of PBR.
The Balance Heuristic
There are some great insights here about the interpretation of the balance heuristic that I hadn’t seen before. Using the balance heuristic to combine samples from a collection of probability distributions $p_i(x)$ (e.g., light source sampling and BSDF sampling) turns out to be equivalent to sampling from a single distribution, whose probability density is the average of all the constituent ones: $$ p_\text{mis}(x) = \frac{1}{N} \sum_i p_i(x) $$ Intuitively, this is useful because the combined distribution inherits all of the peaks of the distributions contributing to it. If one sampling strategy is “good at” sampling a certain region of the integration domain, its $p_i(x)$ will tend to have a peak in that region. When several PDFs are averaged together, the resulting distribution has peaks (albeit smaller ones) everywhere any of the included strategies has a peak.
As an illustration, here are two fictious “PDFs” I made up, and their average:
The third curve, which simulates MIS with the balance heuristic, combines the peaks of the first two.
Here’s all three curves together:
So, the balance heuristic combines the strengths of the sampling strategies within it: it’s “pretty good at” sampling all the regions that any of the constitutent strategies are “good at”.
A corollary of this fact is that the balance heuristic will assign a given path the same contribution weight no matter which strategy generated it. This isn’t the case for other MIS weighting functions, such as the power heuristic.
The Power Heuristic
The power heuristic doesn’t have quite such a tidy interpretation; it’s not equivalent to sampling any single distribution. It intuitively does something similar to the balance heuristic, but also “sharpens” the weights, making small contributions smaller and large ones larger.
According to Veach, this is helpful to reduce variance in areas where one of the included strategies is already a very close match for the integrand. In those cases, MIS isn’t really needed, and the balance heuristic can actually make things worse. The power heuristic makes things less worse.
There’s a great graph in the thesis (Figure 9.10) showing actual variance measurements for light source sampling, BSDF sampling, and the two combined with the balance heuristic or the power heuristic:
These are plotted logarithmically over several orders of magnitude in surface roughness, so they give some nice concrete evidence about the efficacy of MIS in reducing variance across a wide range of shading situations.
MIS Examples
We’ve all seen that classic MIS showcase image, with the different light source sizes versus material roughnesses. That comes from this thesis, of course! Here’s a neat Shadertoy rendition of it, created by Maxwell Planck:
Light source samples are color-coded red, and BSDF samples are green; this is a nice way to visualize how the two get weighted differently across the image.
However, I was interested to see that Veach also has a second demo scene, which I haven’t come across before. It’s simpler and less “pretty” than the more famous one above, but in my mind it demonstrates the value of MIS even more starkly.
This scene just consists of a large emissive surface at right angles to a diffuse surface:
(Shadertoy here, which I adapted from Planck’s.)
Depending how far you are from the light, either BSDF sampling or light source sampling is more effective at estimating the illumination. So, you don’t even need a whole range of material roughnesses to benefit from MIS; area lights and diffuse walls are enough!
Conclusion
I’ve known about multiple importance sampling for a long time, but I never felt like I quite got my head around it. I had the idea that it was something about shifting weight toward whichever sampling method gives you the “highest quality” samples in a given region, but it always seemed a little magical to me how you could determine that from purely local information (the pdfs at a single sample point).
I’m glad I took the time to read through Veach’s own explanation of this, as it goes into a lot more detail about the meaning and intuition behind the balance heuristic. I have a much better understanding of how and why it works, now.
One thing I didn’t get to address here (because I didn’t have much useful to say about it) was the optimality(-ish) proofs Veach gives. There are a few theorems proved in this chapter that roughly say something like “this heuristic might not be the best one, but it’s not that far behind the best one”. I’d like to contextualize these results better (what justifies saying it’s “not that far”?), but I haven’t yet found the right angle.
The last couple chapters in the thesis are about bidirectional path tracing and Metropolis light transport. This post has stretched long enough, so those will have to wait for another time!