<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
	<channel>
		<title>Nathan Reed’s coding blog</title>
		<link>https://www.reedbeta.com/</link>
		<description>Latest posts on Nathan Reed’s coding blog</description>
		<language>en-us</language>
		<lastBuildDate>Sun, 30 Nov 2025 23:16:57 +0000</lastBuildDate>
		<atom:link href="https://www.reedbeta.com/feed/" rel="self" type="application/rss+xml" />
			<item>
				<title>Reading Veach’s Thesis, Part 2</title>
				<link>https://www.reedbeta.com/blog/reading-veach-thesis-2/</link>
				<guid>https://www.reedbeta.com/blog/reading-veach-thesis-2/</guid>
				<dc:creator>Nathan Reed</dc:creator>
<pubDate>Sat, 25 Feb 2023 10:24:44 -0800</pubDate><comments>https://www.reedbeta.com/blog/reading-veach-thesis-2/#comments</comments>					<category>Graphics</category>
					<category>Math</category>
				<description>&lt;p&gt;In this post, we’re continuing to read &lt;a href=&#34;http://graphics.stanford.edu/papers/veach_thesis/&#34;&gt;Eric Veach’s doctoral thesis&lt;/a&gt;.
In &lt;a href=&#34;/blog/reading-veach-thesis/&#34;&gt;our last installment&lt;/a&gt;, we covered the first half of the
thesis, dealing with theoretical foundations for Monte Carlo rendering. This time
we’re tackling chapters 8–9, including one of the key algorithms this thesis is famous for:
multiple importance sampling. Without further ado, let’s tuck in!&lt;/p&gt;
&lt;p&gt;As before, this isn’t going to be a comprehensive review of everything in the thesis—it’s just a
selection of things that made me go “oh, that’s cool”, or “huh! I didn’t know that”.&lt;/p&gt;
&lt;!--more--&gt;

&lt;div class=&#34;toc&#34;&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/reading-veach-thesis-2/#path-space-integrals&#34;&gt;Path-Space Integrals&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/reading-veach-thesis-2/#non-local-path-sampling&#34;&gt;Non-Local Path Sampling&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/reading-veach-thesis-2/#extended-light-path-expressions&#34;&gt;Extended Light Path Expressions&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/reading-veach-thesis-2/#multiple-importance-sampling&#34;&gt;Multiple Importance Sampling&lt;/a&gt;&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/reading-veach-thesis-2/#the-balance-heuristic&#34;&gt;The Balance Heuristic&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/reading-veach-thesis-2/#the-power-heuristic&#34;&gt;The Power Heuristic&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/reading-veach-thesis-2/#mis-examples&#34;&gt;MIS Examples&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/reading-veach-thesis-2/#conclusion&#34;&gt;Conclusion&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;h2 id=&#34;path-space-integrals&#34;&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/reading-veach-thesis-2/#path-space-integrals&#34; title=&#34;Permalink to this section&#34;&gt;Path-Space Integrals&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;We usually see the rendering equation expressed as a fixed-point integral equation. The radiance
field $L$ appears on both sides:
$$
    L = L_e + \int L \, f \, |\cos\theta| \, \mathrm{d}\omega
$$
There are some theorems showing that we can solve this as an infinite series:
$$
    L = L_e + TL_e + T^2 L_e + \cdots
$$
where $T$ is an operator representing the integral over surfaces with their BSDFs. This series
constructs the solution
bounce-by-bounce: first directly emitted light, then light that’s been scattered once, then
scattered twice, and so on.&lt;/p&gt;
&lt;p&gt;The trouble is, this series contains a separate integral for each possible path length.
For the methods Veach is going to deploy later, he needs to be able to
combine paths of all lengths in a single Monte Carlo estimator. In Chapter 8, he
reformulates the rendering equation as an integral over a “space” of all possible paths:
$$
    L = \int L_e \, f \, \mathrm{d}\mu
$$
The idea is that now we’re integrating a new kind of
“variable”, which ranges over &lt;em&gt;all paths&lt;/em&gt; (of any length) in the scene. Here,
$f$ stands for the throughput along a whole path, and $L_e$ for the emitted light injected at its
beginning.&lt;/p&gt;
&lt;p&gt;By itself, this doesn’t really simplify anything; we’ve just moved the complexity from the rendering
equation to the definition of the &lt;em&gt;path space&lt;/em&gt; over which we’re integrating. This is a funny kind of
“space” that actually consists of a &lt;a href=&#34;https://en.wikipedia.org/wiki/Disjoint_union&#34;&gt;disjoint union&lt;/a&gt;
of an infinite sequence of subspaces, one for each possible path length. Those subspaces even have
different dimensionalities, which is extra weird! But with Lebesgue measure theory, this is a legit
space that can be integrated over in a mathematically rigorous way.&lt;/p&gt;
&lt;p&gt;This sets us up for talking about probability distributions over all paths, combining different path
sampling methods in an unbiased way, and so forth—which will be crucial in the following chapters.&lt;/p&gt;
&lt;p&gt;The path-integral formulation of the rendering equation has also become quite popular in light
transport theory papers today.&lt;/p&gt;
&lt;h2 id=&#34;non-local-path-sampling&#34;&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/reading-veach-thesis-2/#non-local-path-sampling&#34; title=&#34;Permalink to this section&#34;&gt;Non-Local Path Sampling&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Veach gives an intriguing example of a potential new path sampling approach that’s facilitated by the
path-integral formulation. Usually, paths are constructed incrementally starting from one end, by
shooting a ray toward the next path vertex. But in the presence of specular surfaces such as a
planar mirror, you could also algebraically solve for a point on the mirror
that will connect two existing path vertices (say, one from a camera subpath and one from a light
subpath). Even more exotically, we could consider solving for chains of multiple specular scattering
events to connect a given pair of endpoints.&lt;/p&gt;
&lt;p&gt;Veach calls this “non-local” path sampling, because it looks at vertices that aren’t just
adjacent to each other on the path, but farther apart.&lt;/p&gt;
&lt;p&gt;Veach merely sketches this idea and remarks that it could be useful. Since then, non-local
sampling ideas have been researched in the &lt;a href=&#34;https://www.cs.cornell.edu/projects/manifolds-sg12/&#34;&gt;manifold exploration&lt;/a&gt;
family of techniques, such as &lt;a href=&#34;https://marc.droske.org/pdfs/2015_mnee.pdf&#34;&gt;Manifold Next-Event Estimation&lt;/a&gt;
and &lt;a href=&#34;https://rgl.epfl.ch/publications/Zeltner2020Specular&#34;&gt;Specular Manifold Sampling&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;extended-light-path-expressions&#34;&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/reading-veach-thesis-2/#extended-light-path-expressions&#34; title=&#34;Permalink to this section&#34;&gt;Extended Light Path Expressions&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;You may have seen “regular expression” syntax describing the vertices of paths, like $LS^*DE$
and suchlike. In this notation, $L$ stands for a light source, $S$ for a (Dirac) specular scattering
event, $D$ a diffuse (or glossy) scattering event, and $E$ for the camera/eye.
It’s a concise way to classify which kinds of paths are handled by different techniques. These
“light path expressions” are widely used in the literature, as well as in production renderers to
split off different lighting components into separate framebuffers.&lt;/p&gt;
&lt;p&gt;Veach describes an extension to this notation in which extra $D$ and $S$ symbols are added to
denote the continuity or discreteness of lights and cameras, in both position and directionality.
For example, a point light (positionally “specular”) that radiates in all directions (“diffuse”)
would be denoted $LSD$. A punctual directional light would be $LDS$, and an area light
would be $LDD$. The camera is described likewise, but in the opposite order: $DSE$ is a pinhole
camera, while $DDE$ is a camera with a physical lens area. These substrings are used as prefixes and
suffixes for what he calls “full-path” regular expressions.&lt;/p&gt;
&lt;p&gt;There’s a certain elegance to this idea, but I have to admit I found it confusing in practice, even
after reading several chapters using these extended expressions. I had to keep looking
up which symbol was the position and which was the direction, and stopping to think about what those
labels mean in the context of a light source or camera.&lt;/p&gt;
&lt;p&gt;This extended syntax doesn’t seem to have been adopted by much later literature, but I did see it
used in the &lt;a href=&#34;https://cg.ivd.kit.edu/publications/p2013/PSR_Kaplanyan_2013/PSR_Kaplanyan_2013.pdf&#34;&gt;Path Space Regularization&lt;/a&gt;
paper by Kaplanyan and Dachsbacher. They also print the light and camera substrings in different
colors, to improve their readability.&lt;/p&gt;
&lt;h2 id=&#34;multiple-importance-sampling&#34;&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/reading-veach-thesis-2/#multiple-importance-sampling&#34; title=&#34;Permalink to this section&#34;&gt;Multiple Importance Sampling&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Alright, now we’re getting into the real meat of Veach’s thesis! In a sense, all the foregoing
material was just setup and preparation for the last three chapters, which contain the thesis’s major
original contributions.&lt;/p&gt;
&lt;p&gt;I’ll assume you’re familiar with the basic ideas of multiple importance sampling, the balance heuristic,
and the power heuristic. If you need a refresher, here’s the &lt;a href=&#34;https://www.pbr-book.org/3ed-2018/Monte_Carlo_Integration/Importance_Sampling#MultipleImportanceSampling&#34;&gt;relevant section of PBR&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id=&#34;the-balance-heuristic&#34;&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/reading-veach-thesis-2/#the-balance-heuristic&#34; title=&#34;Permalink to this section&#34;&gt;The Balance Heuristic&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;There are some great insights here about the interpretation of the balance heuristic that I
hadn’t seen before. Using the balance heuristic to combine samples from a collection of
probability distributions $p_i(x)$ (e.g., light source sampling and BSDF sampling)
turns out to be equivalent to sampling from a &lt;strong&gt;single&lt;/strong&gt; distribution, whose probability
density is the average of all the constituent ones:
$$
p_\text{mis}(x) = \frac{1}{N} \sum_i p_i(x)
$$
Intuitively, this is useful because the combined distribution inherits all of the peaks of the
distributions contributing to it. If one sampling strategy is “good at” sampling a certain region
of the integration domain, its $p_i(x)$ will tend to have a peak in that region. When several PDFs
are averaged together, the resulting distribution has peaks (albeit smaller ones) everywhere any of
the included strategies has a peak.&lt;/p&gt;
&lt;p&gt;As an illustration, here are two fictious “PDFs” I made up, and their average:&lt;/p&gt;
&lt;p&gt;&lt;img alt=&#34;A probability distribution with a narrow peak on the left&#34; class=&#34;max-width-50 invert-when-dark&#34; src=&#34;https://www.reedbeta.com/blog/reading-veach-thesis-2/pdf01.png&#34; title=&#34;A probability distribution with a narrow peak on the left&#34; /&gt;
&lt;img alt=&#34;A probability distribution with a broad peak on the right&#34; class=&#34;max-width-50 invert-when-dark&#34; src=&#34;https://www.reedbeta.com/blog/reading-veach-thesis-2/pdf02.png&#34; title=&#34;A probability distribution with a broad peak on the right&#34; /&gt;
&lt;img alt=&#34;The averaged probability distribution, with both a narrow peak on the left and a broad peak on the right&#34; class=&#34;max-width-50 invert-when-dark&#34; src=&#34;https://www.reedbeta.com/blog/reading-veach-thesis-2/pdfmis.png&#34; title=&#34;The averaged probability distribution, with both a narrow peak on the left and a broad peak on the right&#34; /&gt;&lt;/p&gt;
&lt;p&gt;The third curve, which simulates MIS with the balance heuristic, combines the peaks of
the first two.&lt;/p&gt;
&lt;p&gt;Here’s all three curves together:
&lt;img alt=&#34;All three PDFs plotted together&#34; class=&#34;max-width-50 invert-when-dark&#34; src=&#34;https://www.reedbeta.com/blog/reading-veach-thesis-2/pdfall.png&#34; title=&#34;All three PDFs plotted together&#34; /&gt;&lt;/p&gt;
&lt;p&gt;So, the balance heuristic combines the strengths of the sampling strategies within it:
it’s “pretty good at” sampling all the regions that any of the constitutent strategies are “good at”.&lt;/p&gt;
&lt;p&gt;A corollary of this fact is that the balance heuristic will assign a given path the same
contribution weight no matter which strategy generated it. This isn’t the case
for other MIS weighting functions, such as the power heuristic.&lt;/p&gt;
&lt;h3 id=&#34;the-power-heuristic&#34;&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/reading-veach-thesis-2/#the-power-heuristic&#34; title=&#34;Permalink to this section&#34;&gt;The Power Heuristic&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The power heuristic doesn’t have quite such a tidy interpretation; it’s not equivalent to sampling
any single distribution. It intuitively does something similar to the balance heuristic, but also
“sharpens” the weights, making small contributions smaller and large ones larger.&lt;/p&gt;
&lt;p&gt;According to
Veach, this is helpful to reduce variance in areas where one of the included strategies is
already a very close match for the integrand. In those cases, MIS isn’t really needed, and the
balance heuristic can actually make things worse. The power heuristic makes things less worse.&lt;/p&gt;
&lt;p&gt;There’s a great graph in the thesis (Figure 9.10) showing actual variance measurements for light
source sampling, BSDF sampling, and the two combined with the balance heuristic or the power heuristic:&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://www.reedbeta.com/blog/reading-veach-thesis-2/veach-mis-variance.jpg&#34;
    class=&#34;max-width-50 invert-when-dark&#34;
    title=&#34;Figure 9.10 excerpted from Veach&#39;s thesis&#34;
    alt=&#34;Graphs of variance versus surface roughness for different sampling strategies. Light source sampling performs well at high roughness but very poorly at low roughness, and BSDF sampling is the opposite. The balance heuristic and power heuristic both perform well over the full range of roughness values. The power heuristic gives the lowest variance overall.&#34; /&gt;&lt;/p&gt;
&lt;p&gt;These are plotted logarithmically over several orders of magnitude in surface roughness, so they
give some nice concrete evidence about the efficacy of MIS in reducing variance across a wide
range of shading situations.&lt;/p&gt;
&lt;h3 id=&#34;mis-examples&#34;&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/reading-veach-thesis-2/#mis-examples&#34; title=&#34;Permalink to this section&#34;&gt;MIS Examples&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;We’ve all seen that classic MIS showcase image, with the different light source sizes versus material
roughnesses. That comes from this thesis, of course!
&lt;a href=&#34;https://www.shadertoy.com/view/lsV3zV&#34;&gt;Here’s a neat Shadertoy rendition&lt;/a&gt; of it, created by Maxwell Planck:&lt;/p&gt;
&lt;p&gt;&lt;img alt=&#34;Recreation of Veach&#39;s classic MIS demo scene, with light source samples in red and BSDF samples in green&#34; class=&#34;not-too-wide&#34; src=&#34;https://www.reedbeta.com/blog/reading-veach-thesis-2/mis01.jpg&#34; title=&#34;Recreation of Veach&#39;s classic MIS demo scene, with light source samples in red and BSDF samples in green&#34; /&gt;&lt;/p&gt;
&lt;p&gt;Light source samples are color-coded red, and BSDF samples are green; this is a nice way to
visualize how the two get weighted differently across the image.&lt;/p&gt;
&lt;p&gt;However, I was interested to see that Veach also has a second demo scene, which I haven’t come across
before. It’s simpler and less “pretty” than the more famous one above,
but in my mind it demonstrates the value of MIS even more starkly.&lt;/p&gt;
&lt;p&gt;This scene just consists of a large emissive surface at right angles to a diffuse surface:&lt;/p&gt;
&lt;p&gt;&lt;img alt=&#34;Simple MIS demo scene, with light source samples in red and BSDF samples in green&#34; class=&#34;not-too-wide&#34; src=&#34;https://www.reedbeta.com/blog/reading-veach-thesis-2/mis02.jpg&#34; title=&#34;Simple MIS demo scene, with light source samples in red and BSDF samples in green&#34; /&gt;&lt;/p&gt;
&lt;p&gt;(&lt;a href=&#34;https://www.shadertoy.com/view/mllSDS&#34;&gt;Shadertoy here&lt;/a&gt;, which I adapted from Planck’s.)&lt;/p&gt;
&lt;p&gt;Depending how far you are from the light, either BSDF sampling or light source sampling is more
effective at estimating the illumination. So, you don’t even need a whole range of material roughnesses
to benefit from MIS; area lights and diffuse walls are enough!&lt;/p&gt;
&lt;h2 id=&#34;conclusion&#34;&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/reading-veach-thesis-2/#conclusion&#34; title=&#34;Permalink to this section&#34;&gt;Conclusion&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;I’ve known about multiple importance sampling for a long time, but I never felt like I quite
got my head around it. I had the idea that it was something about shifting weight toward whichever
sampling method gives you the “highest quality” samples in a given region, but it always
seemed a little magical to me how you could determine that from purely local information (the pdfs
at a single sample point).&lt;/p&gt;
&lt;p&gt;I’m glad I took the time to read through Veach’s own explanation of this, as it goes into a lot more
detail about the meaning and intuition behind the balance heuristic. I have a much better
understanding of how and why it works, now.&lt;/p&gt;
&lt;p&gt;One thing I didn’t get to address here (because I didn’t have much useful to say
about it) was the optimality(-ish) proofs Veach gives. There are a few theorems proved in this
chapter that roughly say something like “this heuristic might not be the best one, but it’s not
that far behind the best one”. I’d like to contextualize these results better (what justifies saying
it’s “not that far”?), but I haven’t yet found the right angle.&lt;/p&gt;
&lt;p&gt;The last couple chapters in the thesis are about bidirectional path tracing and Metropolis light
transport. This post has stretched long enough, so those will have to wait for another time!&lt;/p&gt;</description>
			</item>
			<item>
				<title>Reading Veach’s Thesis</title>
				<link>https://www.reedbeta.com/blog/reading-veach-thesis/</link>
				<guid>https://www.reedbeta.com/blog/reading-veach-thesis/</guid>
				<dc:creator>Nathan Reed</dc:creator>
<pubDate>Sat, 03 Dec 2022 14:50:39 -0800</pubDate><comments>https://www.reedbeta.com/blog/reading-veach-thesis/#comments</comments>					<category>Graphics</category>
					<category>Math</category>
				<description>&lt;p&gt;If you’ve studied path tracing or physically-based rendering in the last twenty years,
you’ve probably heard of &lt;a href=&#34;https://en.wikipedia.org/wiki/Eric_Veach&#34;&gt;Eric Veach&lt;/a&gt;. His Ph.D thesis,
published in 1997, has been hugely influential in Monte Carlo rendering. Veach introduced
key techniques like multiple importance sampling and bidirectional path tracing, and
clarified a lot of the mathematical theory behind Monte Carlo rendering. These ideas not
only inspired a great deal of later research, but are still used in production renderers today.&lt;/p&gt;
&lt;p&gt;Recently, I decided to sit down and read this classic thesis in full. Although I’ve seen expositions of
the central ideas in other places such as &lt;a href=&#34;https://www.pbr-book.org/&#34;&gt;PBR&lt;/a&gt;, I’d never gone back to
the original source. The thesis is &lt;a href=&#34;http://graphics.stanford.edu/papers/veach_thesis/&#34;&gt;available from Stanford’s site&lt;/a&gt;
(scroll down to the very bottom for PDF links). It’s over 400 pages—a textbook in its own right—but
I’ve found it very readable, with clearly presented ideas and incisive analysis. There’s a lot of formal
math, too, but you don’t really need more than linear algebra, calculus, and some probability theory
to understand it. I’m only about halfway through, but there’s already been some really interesting
bits that I’d like to share. So hop in, and let’s read Veach’s thesis together!&lt;/p&gt;
&lt;!--more--&gt;
&lt;p&gt;&lt;img alt=&#34;Mean Girls meme: &amp;quot;Get in loser, we&#39;re reading Veach&#39;s thesis&amp;quot;&#34; src=&#34;https://www.reedbeta.com/blog/reading-veach-thesis/get-in-loser.jpg&#34; title=&#34;Mean Girls meme: &amp;quot;Get in loser, we&#39;re reading Veach&#39;s thesis&amp;quot;&#34; /&gt;&lt;/p&gt;
&lt;p&gt;This isn’t going to be a comprehensive review of everything in the thesis—it’s just a selection
of things that made me go “oh, that’s cool”, or “huh! I didn’t know that”.&lt;/p&gt;
&lt;div class=&#34;toc&#34;&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/reading-veach-thesis/#unbiased-vs-consistent-algorithms&#34;&gt;Unbiased vs Consistent Algorithms&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/reading-veach-thesis/#photon-phase-space&#34;&gt;Photon Phase Space&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/reading-veach-thesis/#incident-and-exitant-radiance&#34;&gt;Incident and Exitant Radiance&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/reading-veach-thesis/#reciprocity-and-adjoint-bsdfs&#34;&gt;Reciprocity and Adjoint BSDFs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/reading-veach-thesis/#non-reciprocity-of-refraction&#34;&gt;Non-Reciprocity of Refraction&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/reading-veach-thesis/#conclusion&#34;&gt;Conclusion&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;h2 id=&#34;unbiased-vs-consistent-algorithms&#34;&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/reading-veach-thesis/#unbiased-vs-consistent-algorithms&#34; title=&#34;Permalink to this section&#34;&gt;Unbiased vs Consistent Algorithms&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;You’ve probably heard people talk about “bias” in rendering algorithms and how unbiased algorithms
are better. Sounds reasonable, bias is bad and wrong, right? But then there’s this other thing
called “consistent” that algorithms can be, which makes them kind of okay even if they’re biased?
I’ve encountered these concepts in the graphics world but never really saw a clear explanation of
them (especially “consistent”).&lt;/p&gt;
&lt;p&gt;Veach has a pretty nice one-page explanation of what this is and why it matters (§1.4.4). Briefly,
bias is when the mean value of the estimator is wrong, independent of the noise due to random
sampling. “Consistent” is when the algorithm’s bias approaches zero as you take more
samples. An unbiased algorithm generates samples that are randomly spread around the true, correct
answer from the very beginning. A consistent algorithm generates samples that are randomly spread
around a wrong answer to begin with, but then they migrate toward the right answer over time.&lt;/p&gt;
&lt;p&gt;The reason it matters is that with an unbiased algorithm, you can track the variance in your samples
and get a good idea of how much error there is, and you can accurately predict how many samples it’s
going to take to get the error down to a given level. With a biased but consistent algorithm,
you could have a situation where it looks like it’s converged because the samples have low variance,
but it’s converged to an inaccurate value. You have no real way to detect that, and no way to tell
how many more samples might be necessary to achieve a given error bound.&lt;/p&gt;
&lt;h2 id=&#34;photon-phase-space&#34;&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/reading-veach-thesis/#photon-phase-space&#34; title=&#34;Permalink to this section&#34;&gt;Photon Phase Space&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The classic Kajiya rendering equation deals with this quantity called “radiance” that’s notoriously
hard to get a handle on, both intuitively and mathematically. We’re
usually shown a definition that has some derivative-looking notation like
$$
L = \frac{\mathrm{d}^2\Phi}{\mathrm{d}A \, \mathrm{d}\omega \cos \theta}
$$
which, like, what? What is the actual function that is being differentiated here? What are the
variables? What does this even mean?&lt;/p&gt;
&lt;p&gt;If you’re the sort of person who feels more secure when things like this are put on an explicit, formal
mathematical footing, Chapter 3 is for you. Veach takes it back to physics by defining a phase
space (state space) for photons. Each photon has a position, direction, and wavelength,
so the phase space is 6-dimensional (3 + 2 + 1). We can imagine the photons in the scene as a cloud of points in
this space, moving around with time, spawning at light sources and occasionally dying when absorbed
at surfaces.&lt;/p&gt;
&lt;p&gt;Then, all the usual radiometric quantities like flux, irradiance, radiance, and so on can be
defined in terms of measuring the density of photons (or rather, their energy density) in various subsets of
this space. For example, radiance is defined in terms of the photons flowing through a given patch
of surface, with directions within a given cone, and then taking a limit as the surface patch and
cone sizes go to zero. This kind of limiting procedure is formalized using measure theory, as a
&lt;a href=&#34;https://en.wikipedia.org/wiki/Radon%E2%80%93Nikodym_theorem&#34;&gt;Radon–Nikodym derivative&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;incident-and-exitant-radiance&#34;&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/reading-veach-thesis/#incident-and-exitant-radiance&#34; title=&#34;Permalink to this section&#34;&gt;Incident and Exitant Radiance&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Another thing we get from this notion of photon phase space is a precise
distinction between incident and exitant radiance. The rendering equation describes how to calculate
$L_o$ (exitant radiance, leaving the surface) in terms of the BSDF and $L_i$ (incident radiance,
arriving at the surface). But then how are these $L_o$ and $L_i$ related to each other? There’s just
one unified radiance field, not two; but trying to define it as a function of position and direction,
$L(x, \omega)$, we run into some awkwardness at points on surfaces because the radiance changes
discontinuously there.&lt;/p&gt;
&lt;p&gt;Veach §3.5 gives a nice definition of incident and exitant radiance functions in terms of the
photon phase space, by looking at trajectories moving toward the surface or away from it in time.
(To be fair, I think this could be done as well by looking at &lt;a href=&#34;https://en.wikipedia.org/wiki/One-sided_limit&#34;&gt;one-sided limits&lt;/a&gt;
of the 3D radiance field as you approach the surface from either direction.)&lt;/p&gt;
&lt;h2 id=&#34;reciprocity-and-adjoint-bsdfs&#34;&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/reading-veach-thesis/#reciprocity-and-adjoint-bsdfs&#34; title=&#34;Permalink to this section&#34;&gt;Reciprocity and Adjoint BSDFs&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Much of the thesis in Chapters 4–7 is concerned with how to handle non-reciprocal BSDFs—or, as
Veach calls them, non-symmetric. We’re often told that BSDFs “should” obey a reciprocity law,
$f(\omega_i \to \omega_o) = f(\omega_o \to \omega_i)$, in order to be well-behaved. However, Veach
points out that non-reciprocal BSDFs are commonplace and unavoidable in practice:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Refraction is non-reciprocal (§5.2). Radiance changes by a factor of $\eta_o^2 / \eta_i^2$
    when refracted (more about this in the next section); reverse the direction of light, and it
    inverts this factor.&lt;/li&gt;
&lt;li&gt;Shading normals are non-reciprocal (§5.3). Shading normals can be interpreted as a factor
    $|\omega_i \cdot n_s| / |\omega_i \cdot n_g|$ multiplied into the BSDF. Note that this
    expression involves only $\omega_i$ and not $\omega_o$, so if those directions are swapped, this
    value will in general be different.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Does this spell doom for physically-based rendering algorithms? Surprisingly, no. According to Veach,
it just means we
have to be careful about the order of arguments to our BSDFs, and not treat them as interchangeable.
The rendering will still work as long as we’re consistent about which direction light is flowing.
(It’s a bit like working with non-commutative algebra; you can still
do most of the same things, you just need to take care to preserve the order of multiplications.)&lt;/p&gt;
&lt;p&gt;For photon mapping or bidirectional
path tracing, we might need two separate importance-sampling routines: one to sample $\omega_i$
given $\omega_o$ (when tracing from the camera) and one to sample $\omega_o$ given $\omega_i$ (when
tracing from a light source).&lt;/p&gt;
&lt;p&gt;Another way to think about it is that light is emitted and scatters through the scene, it uses the
regular BSDF, but when “importance” is emitted by cameras and scatters through the scene, it uses
the &lt;em&gt;adjoint&lt;/em&gt; BSDF—which is just the BSDF with its arguments swapped (§3.7.6). Then both directions
of scattering give consistent results and can be intermixed in algorithms.&lt;/p&gt;
&lt;h2 id=&#34;non-reciprocity-of-refraction&#34;&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/reading-veach-thesis/#non-reciprocity-of-refraction&#34; title=&#34;Permalink to this section&#34;&gt;Non-Reciprocity of Refraction&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;I was not previously aware that radiance should be scaled by $\eta_o^2 / \eta_i^2$ when a ray is
refracted! This fact somehow skipped me by in everything I’ve read about physically-based light
transport (although when I looked, I found PBR discussing this issue in &lt;a href=&#34;https://www.pbr-book.org/3ed-2018/Light_Transport_III_Bidirectional_Methods/The_Path-Space_Measurement_Equation#Non-symmetricScattering&#34;&gt;§16.1.3&lt;/a&gt;).
The radiance changes because light gets compressed into a smaller range of directions when refracted,
as this diagram (excerpted from the thesis) shows:&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://www.reedbeta.com/blog/reading-veach-thesis/veach-refraction.jpg&#34;
    class=&#34;invert-when-dark&#34;
    title=&#34;Figure 5.2 excerpted from Veach&#39;s thesis&#34;
    alt=&#34;A diagram showing incident rays in the upper hemisphere of a surface compressed into a narrower cone of rays in the lower hemisphere. Caption reads, &amp;quot;Figure 5.2: When light enters a medium with a higher refractive index, the same light energy is squeezed into a smaller volume. This causes the radiance along each ray to increase.&amp;quot;&#34; /&gt;&lt;/p&gt;
&lt;p&gt;So, a ray entering a glass object should have its radiance more than
doubled. However, the scaling is undone when the ray exits the glass again. That explains why you can
often get away without modeling this radiance scaling in a renderer; if the camera and all light sources
are outside of any refractive media, there’s no visible effect. This would only show up if,
for instance, some light sources were inside a medium—and would only show up as those light sources
being a little dimmer than they should be, which would be easy to overlook (and easy for an artist to
compensate by bringing those lights up a bit).&lt;/p&gt;
&lt;p&gt;However, the radiance scaling does become important when we use things like photon mapping and
bidirectional path tracing, where we have to use the adjoint BSDF when tracing from the light
sources. Then, the $\eta^2$ factors apply inversely to these paths, which is important to get right,
or else the bidirectional methods won’t be consistent with unidirectional ones.&lt;/p&gt;
&lt;p&gt;Veach also derives (§6.2) a generalized reciprocity relationship that holds for BSDFs with
refraction (in the absence of shading normals):
$$
    \frac{f(\omega_i \to \omega_o)}{\eta_o^2} = \frac{f(\omega_o \to \omega_i)}{\eta_i^2}
$$
He proposes that instead of tracking radiance $L$ along paths, we instead track the quantity
$L/\eta^2$. When BSDFs are written with respect to this modified radiance, the $\eta^2$ factors
cancel out and the BSDF becomes symmetric again. In this case, no scaling needs to be done
as the ray traverses different media, and paths in both directions can operate by the same rules;
only at the ends of the path (at the camera and at lights) do some $\eta^2$ factors need to be
incorporated. Veach argues that this a simpler and easier-to-implement approach to path tracing
overall.&lt;/p&gt;
&lt;p&gt;It’s interesting to note, though, that PBRT doesn’t take Veach’s suggested approach here; it tracks
unscaled radiance, and puts in the correct scaling factors due to refraction, for paths in both
directions.&lt;/p&gt;
&lt;h2 id=&#34;conclusion&#34;&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/reading-veach-thesis/#conclusion&#34; title=&#34;Permalink to this section&#34;&gt;Conclusion&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The refraction scaling business was the most surprising point for me in what I’ve read so far, but
Veach’s argument for non-symmetric scattering being OK as long as you take care to handle it correctly
was also very intriguing!&lt;/p&gt;
&lt;p&gt;That brings us to the end of Chapter 7, which is about halfway through. The next chapters are about
multiple importance sampling, bidirectional path tracing, and Metropolis sampling. I hope this was
interesting, and maybe I’ll do a follow-up post when I’ve finished it!&lt;/p&gt;</description>
			</item>
			<item>
				<title>Texture Gathers and Coordinate Precision</title>
				<link>https://www.reedbeta.com/blog/texture-gathers-and-coordinate-precision/</link>
				<guid>https://www.reedbeta.com/blog/texture-gathers-and-coordinate-precision/</guid>
				<dc:creator>Nathan Reed</dc:creator>
<pubDate>Sat, 15 Jan 2022 08:21:17 -0800</pubDate><comments>https://www.reedbeta.com/blog/texture-gathers-and-coordinate-precision/#comments</comments>					<category>Graphics</category>
					<category>GPU</category>
				<description>&lt;p&gt;A few years ago I came across an interesting problem. I was trying to implement some custom texture
filtering logic in a pixel shader. It was for a shadow map, and I wanted to experiment with filters
beyond the usual hardware bilinear.&lt;/p&gt;
&lt;p&gt;I went about it by using texture gathers to retrieve a neighborhood of texels, then
performing my own filtering math in the shader. I used &lt;code&gt;frac&lt;/code&gt; on the scaled texture coordinates to
figure out where in the texel I was, emulating the logic the GPU texture unit would have used to
calculate weights for bilinear filtering.&lt;/p&gt;
&lt;p&gt;To my surprise, I noticed a strange artifact in the resulting image when I got the camera close to a
surface. A grid of flickery, stipply lines appeared, delineating the texels in the soft edges of the
shadows—but not in areas that were fully shadowed or fully lit. What was going on?&lt;/p&gt;
&lt;!--more--&gt;
&lt;div class=&#34;toc&#34;&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/texture-gathers-and-coordinate-precision/#into-the-texture-verse&#34;&gt;Into the Texture-Verse&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/texture-gathers-and-coordinate-precision/#precision-limited-edition&#34;&gt;Precision, Limited Edition&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/texture-gathers-and-coordinate-precision/#eight-is-a-magic-number&#34;&gt;Eight is a Magic Number&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/texture-gathers-and-coordinate-precision/#interlude-nearest-filtering&#34;&gt;Interlude: Nearest Filtering&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/texture-gathers-and-coordinate-precision/#conclusion&#34;&gt;Conclusion&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;figure class=&#34;not-too-wide&#34; alt=&#34;Artifacts in shadow due to gather mismatch&#34; title=&#34;Artifacts in shadow due to gather mismatch&#34; &gt;
&lt;a href=&#34;https://www.reedbeta.com/blog/texture-gathers-and-coordinate-precision/shadow-artifact.jpg&#34;&gt;&lt;img src=&#34;https://www.reedbeta.com/blog/texture-gathers-and-coordinate-precision/shadow-artifact.jpg&#34;/&gt;&lt;/a&gt;      &lt;figcaption&gt;&lt;p&gt;Dramatic reenactment of the artifact that started me on this investigation.&lt;/p&gt;&lt;/figcaption&gt;
    &lt;/figure&gt;

&lt;p&gt;After some head-scratching and experimenting, I understood a little more about the source of these
errors. In the affected pixels, there was a mismatch between the texels returned by
the gather and the texels that the shader &lt;em&gt;thought&lt;/em&gt; it was working with.&lt;/p&gt;
&lt;p&gt;You see, the objective of a gather operation is to retrieve the set of four texels that would be
used for bilinear filtering, if that’s what we were doing. You give it a UV position, and it finds
the 2×2 quad of texels whose centers surround that point, and returns all four of them in a vector
(one channel at a time).&lt;/p&gt;
&lt;p&gt;As the UV position moves through the texture, when it crosses the line between texel centers, the
gather will switch to returning the next set of four texels.&lt;/p&gt;
&lt;p&gt;&lt;img alt=&#34;Diagram of texels returned by a gather operation&#34; class=&#34;not-too-wide only-light-theme&#34; src=&#34;https://www.reedbeta.com/blog/texture-gathers-and-coordinate-precision/gather.png&#34; title=&#34;Diagram of texels returned by a gather operation&#34; /&gt;
&lt;img alt=&#34;Diagram of texels returned by a gather operation&#34; class=&#34;not-too-wide only-dark-theme&#34; src=&#34;https://www.reedbeta.com/blog/texture-gathers-and-coordinate-precision/gather-dark.png&#34; title=&#34;Diagram of texels returned by a gather operation&#34; /&gt;&lt;/p&gt;
&lt;p&gt;In this diagram, the large labeled squares are texels. Whenever the input UV position is within
the solid blue box, the gather returns texels ABCD. If the input point moves
to the right and crosses into the dotted blue box, then the gather will suddenly start returning
BEDF instead. It’s a step function—a discontinuity.&lt;/p&gt;
&lt;p&gt;Meanwhile, in my pixel shader I’m calculating weights for combining these texels according to some
filter. To do that, I need to know where I am within the current gather quad. The expression for
this is:&lt;/p&gt;
&lt;div class=&#34;codehilite&#34;&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class=&#34;kt&#34;&gt;float2&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;texelFrac&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;nb&#34;&gt;frac&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;uv&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;*&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;textureSize&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;-&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;mf&#34;&gt;0.5&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;(The &lt;code&gt;- 0.5&lt;/code&gt; here is to make coordinates relative to texel centers instead of texel edges.)&lt;/p&gt;
&lt;p&gt;This &lt;code&gt;frac&lt;/code&gt; is &lt;em&gt;supposed&lt;/em&gt; to wrap around from 1 back to 0 at the exact same place where the gather switches
to the next set of texels. The &lt;code&gt;frac&lt;/code&gt; has a discontinuity, and it needs to match &lt;em&gt;exactly&lt;/em&gt; with the
discontinuity in the gather result, for the filter calculation to be consistent.&lt;/p&gt;
&lt;p&gt;But in my shader, they didn’t match. As I discovered, there was a region—a very small
region, but large enough to be visible—where the gather switched to the next set of texels
&lt;em&gt;before&lt;/em&gt; the &lt;code&gt;frac&lt;/code&gt; wrapped around to 0. Then, the shader blithely made its weight calculations for
the wrong set of texels, with ugly results.&lt;/p&gt;
&lt;p&gt;&lt;img alt=&#34;Texel squares according to frac (blue) and gather (yellow)&#34; class=&#34;not-too-wide only-light-theme&#34; src=&#34;https://www.reedbeta.com/blog/texture-gathers-and-coordinate-precision/gather-and-frac.png&#34; title=&#34;Texel squares according to frac (blue) and gather (yellow)&#34; /&gt;
&lt;img alt=&#34;Texel squares according to frac (blue) and gather (yellow)&#34; class=&#34;not-too-wide only-dark-theme&#34; src=&#34;https://www.reedbeta.com/blog/texture-gathers-and-coordinate-precision/gather-and-frac-dark.png&#34; title=&#34;Texel squares according to frac (blue) and gather (yellow)&#34; /&gt;&lt;/p&gt;
&lt;p&gt;This diagram is not to scale—the actual mismatch is much smaller than depicted here—but it
illustrates what was going on. It was as if the texel squares as judged by the gather
were the yellow squares, ever so slightly offset from the blue ones that I got by calculating
directly in the shader. Those flickery lines in the shadow will make their entrance whenever some
pixels happen to fall into the tiny slivers of space between these two conflicting accounts of
“where the texel grid is”.&lt;/p&gt;
&lt;p&gt;Now on the one hand, this suggests a simple fix. We can add a small offset to our calculation:&lt;/p&gt;
&lt;div class=&#34;codehilite&#34;&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class=&#34;k&#34;&gt;const&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;kt&#34;&gt;float&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;offset&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;cm&#34;&gt;/* TBD */&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;span class=&#34;kt&#34;&gt;float2&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;texelFrac&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;nb&#34;&gt;frac&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;uv&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;*&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;textureSize&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;+&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;-&lt;/span&gt;&lt;span class=&#34;mf&#34;&gt;0.5&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;+&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;offset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Then we can empirically hand-tweak the value of &lt;code&gt;offset&lt;/code&gt;, and see if we can find a value that makes
the artifact go away.&lt;/p&gt;
&lt;p&gt;On the other hand, we’d really like to understand why this mismatch exists in the first place. And
as it turns out, once we understand it properly, we’ll be able to deduce the exact, correct value
for &lt;code&gt;offset&lt;/code&gt;—no hand-tweaking necessary.&lt;/p&gt;
&lt;h2 id=&#34;into-the-texture-verse&#34;&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/texture-gathers-and-coordinate-precision/#into-the-texture-verse&#34; title=&#34;Permalink to this section&#34;&gt;Into the Texture-Verse&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Texture gathers and samples are performed by a GPU’s “texture units”—fixed-function hardware
blocks that shaders call out to. From a shader author’s
point of view, texture units are largely a black box: put UVs in, get filtered results back.
But to address our questions about the behavior of gathers, we’ll need to dig down a bit into
what goes on inside that black box.&lt;/p&gt;
&lt;p&gt;We won’t (and can’t) go all the way down to the exact hardware architecture, as those
details are proprietary, and GPU vendors don’t share a lot about them. Fortunately, we won’t need to,
as we can get a general &lt;em&gt;logical&lt;/em&gt; picture of what’s happening on the basis of formal API specs,
which all the vendors’ texture units need to comply with.&lt;/p&gt;
&lt;p&gt;In particular, we can look at the
&lt;a href=&#34;https://microsoft.github.io/DirectX-Specs/d3d/archive/D3D11_3_FunctionalSpec.htm&#34;&gt;Direct3D functional spec&lt;/a&gt;
(written for D3D11, but applies to D3D12 as well),
and the &lt;a href=&#34;https://www.khronos.org/registry/vulkan/specs/1.2-extensions/html/vkspec.html&#34;&gt;Vulkan spec&lt;/a&gt;.
We could also look at OpenGL, but we won’t bother, as Vulkan generally specifies GPU
behavior the same or more tightly than OpenGL.&lt;/p&gt;
&lt;p&gt;Let’s start with Direct3D. What does it have to say about how texture sampling works?&lt;/p&gt;
&lt;p&gt;Quite a bit—that’s
the topic of a lengthy section, &lt;a href=&#34;https://microsoft.github.io/DirectX-Specs/d3d/archive/D3D11_3_FunctionalSpec.htm#7.18%20Texture%20Sampling&#34;&gt;§7.18 Texture Sampling&lt;/a&gt;.
There are numerous steps described for the sampling pipeline, including range reduction, texel
addressing modes, mipmap selection and anisotropy, and filtering. Let’s focus in on how the texels
to sample are determined in the case of (bi)linear filtering:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p class=&#34;attribution&#34;&gt;&lt;a href=&#34;https://microsoft.github.io/DirectX-Specs/d3d/archive/D3D11_3_FunctionalSpec.htm#7.18.8%20Linear%20Sample%20Addressing&#34;&gt;D3D §7.18.8 Linear Sample Addressing&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;…Linear sampling in 1D selects the nearest two texels to the sample location and weights the texels
based on the proximity of the sample location to them.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Given a 1D texture coordinate in normalized space U, assumed to be any float32 value.&lt;/li&gt;
&lt;li&gt;U is scaled by the Texture1D size, and 0.5f is subtracted. Call this scaledU.&lt;/li&gt;
&lt;li&gt;scaledU is converted to at least 16.8 Fixed Point. Call this fxpScaledU.&lt;/li&gt;
&lt;li&gt;The integer part of fxpScaledU is the chosen left texel. Call this tFloorU. Note that the
  conversion to Fixed Point basically accomplished: tFloorU = floor(scaledU).&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The right texel, tCeilU is simply tFloorU + 1.&lt;/p&gt;
&lt;p&gt;…&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The procedure described above applies to linear sampling of a given miplevel of a Texture2D as well…&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;OK, here’s something interesting: “&lt;strong&gt;scaledU is converted to at least 16.8 Fixed Point.&lt;/strong&gt;” What’s that
about? Why would we want the texture sample coordinates to be in fixed-point, rather than staying in
the usual 32-bit floating-point?&lt;/p&gt;
&lt;p&gt;One reason is uniformity of precision. Another section of the D3D spec explains:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p class=&#34;attribution&#34;&gt;&lt;a href=&#34;https://microsoft.github.io/DirectX-Specs/d3d/archive/D3D11_3_FunctionalSpec.htm#3.2.4%20Fixed%20Point%20Integers&#34;&gt;D3D §3.2.4 Fixed Point Integers&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Fixed point integer representations are used in a couple of places in D3D11…&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Texture coordinates for sampling operations are snapped to fixed point (after being scaled by
  texture size), to uniformly distribute precision across texture space, in choosing filter tap
  locations/weights. Weight values are converted back to floating point before actual filtering
  arithmetic is performed.&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;As you may know, floating-point values are designed to have finer precision when the value is closer
to 0. That means texture coordinates would be more precise near the origin of UV space, and less
elsewhere. However, image-space operations such as filtering should behave identically no matter their
position within the image. Fixed-point formats have the same precision everywhere, so they are
well-suited for this.&lt;/p&gt;
&lt;figure class=&#34;not-too-wide only-light-theme&#34; alt=&#34;Fixed-point texture coordinate grid (3 subpixel bits)&#34; title=&#34;Fixed-point texture coordinate grid (3 subpixel bits)&#34; &gt;
&lt;img src=&#34;https://www.reedbeta.com/blog/texture-gathers-and-coordinate-precision/fixed-point.png&#34;/&gt;        &lt;figcaption&gt;&lt;p&gt;Illustration of fixed-point texture coordinates, if there were only 3 subpixel bits (2&lt;sup&gt;3&lt;/sup&gt; = 8 subdivisions). Each dot is a possible fixed-point value. Two adjacent bilinear/gather footprints are highlighted in yellow and cyan.&lt;/p&gt;&lt;/figcaption&gt;
    &lt;/figure&gt;
&lt;figure class=&#34;not-too-wide only-dark-theme&#34; alt=&#34;Fixed-point texture coordinate grid (3 subpixel bits)&#34; title=&#34;Fixed-point texture coordinate grid (3 subpixel bits)&#34; &gt;
&lt;img src=&#34;https://www.reedbeta.com/blog/texture-gathers-and-coordinate-precision/fixed-point-dark.png&#34;/&gt;       &lt;figcaption&gt;&lt;p&gt;Illustration of fixed-point texture coordinates, if there were only 3 subpixel bits (2&lt;sup&gt;3&lt;/sup&gt; = 8 subdivisions). Each dot is a possible fixed-point value. Two adjacent bilinear/gather footprints are highlighted in yellow and cyan.&lt;/p&gt;&lt;/figcaption&gt;
    &lt;/figure&gt;

&lt;p&gt;(Incidentally, you might wonder: don’t we already have non-uniform precision in the original float32
coordinates that the shader passed into the texture unit? Yes—but given current API limits on
texture sizes, the 24-bit float mantissa gives precision equal or better than 16.8 fixed-point,
throughout at least the [0,1]² UV rectangle. You can still lose too much precision if you work with
too-large UV values in float32 format, though.)&lt;/p&gt;
&lt;p&gt;Another possible reason for using fixed-point in texture units is just that integer ALUs are
smaller and cheaper than floating-point ones. But there are a lot of other operations in
the texture pipeline still done in full float32 format, so this likely isn’t a major design concern.&lt;/p&gt;
&lt;h2 id=&#34;precision-limited-edition&#34;&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/texture-gathers-and-coordinate-precision/#precision-limited-edition&#34; title=&#34;Permalink to this section&#34;&gt;Precision, Limited Edition&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;At this point, we can surmise that our mysterious gather discrepancy may have something to do with
coordinates being converted to “at least 16.8 fixed point”, per the D3D spec.&lt;/p&gt;
&lt;p&gt;These are the scaled texel coordinates, so the integer part of the value (the 16 bits in
front of the radix point) determines which texels we’re looking at, and then there are at least 8
more bits in the fractional part, specifying where we are within the texel.&lt;/p&gt;
&lt;p&gt;The minimum 8 bits of &lt;em&gt;sub-texel&lt;/em&gt; precision is also re-stated in various other locations in the spec,
such as:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p class=&#34;attribution&#34;&gt;&lt;a href=&#34;https://microsoft.github.io/DirectX-Specs/d3d/archive/D3D11_3_FunctionalSpec.htm#7.18.16.1%20Texture%20Addressing%20and%20LOD%20Precision&#34;&gt;D3D §7.18.16.1 Texture Addressing and LOD Precision&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The amount of subtexel precision required (after scaling texture coordinates by texture size) is
at least 8-bits of fractional precision (2&lt;sup&gt;8&lt;/sup&gt; subdivisions).&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The D3D spec text is also clear that conversion to fixed-point occurs &lt;em&gt;before&lt;/em&gt; taking the
integer part of the coordinate to determine which texels are filtered.&lt;/p&gt;
&lt;p&gt;But how does this end up inducing a tiny offset to the locations of texel squares, when we compare
the 32-bit float inputs to the fixed-point versions?&lt;/p&gt;
&lt;p&gt;There’s one more ingredient we need to look at it, which is &lt;em&gt;how&lt;/em&gt; the conversion to fixed-point is
accomplished. Specifically: how does it do rounding? The 16.8 fixed-point has coarser precision than
the input floats in most cases, so floats will need to be snapped to one of the available 16.8 values.&lt;/p&gt;
&lt;p&gt;Back to our best friend, the D3D spec, which gives detailed rules about the various numeric formats,
the arithmetic rules they need to satisfy, and the processes for conversion amongst them. Regarding
conversion of floats to fixed-point:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p class=&#34;attribution&#34;&gt;&lt;a href=&#34;https://microsoft.github.io/DirectX-Specs/d3d/archive/D3D11_3_FunctionalSpec.htm#3.2.4.1%20FLOAT%20-%3E%20Fixed%20Point%20Integer&#34;&gt;D3D §3.2.4.1 FLOAT -&amp;gt; Fixed Point Integer&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;For D3D11 implementations are permitted 0.6f ULP tolerance in the integer result vs. the
infinitely precise value n*2^f after the last step above.&lt;/p&gt;
&lt;p&gt;The diagram below depicts the ideal/reference float to fixed conversion (including round-to-nearest-even),
yielding 1/2 ULP accuracy to an infinitely precise result, which is more accurate than required by
the tolerance defined above. Future D3D versions will require exact conversion like this reference.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;[in the “float32 -&amp;gt; Fixed Point Conversion” diagram:]&lt;/em&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Round the 32-bit value to a decimal that is extraBits to the left of the LSB end, using
  nearest-even.&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;There’s the answer: the conversion uses rounding to nearest-even (the same as
the default mode for float math). This means floating-point values will be snapped to the nearest
fixed-point value, with ties breaking to the even side.&lt;/p&gt;
&lt;p&gt;Now, we’re finally in a position to explain the artifact that started this whole quest. When we pass
our float32 UVs into the texture unit, they get rounded to the nearest
fixed-point value at 8 subpixel bits—in other words, the nearest 1/256th of a texel. This means
that the last &lt;em&gt;half&lt;/em&gt; a bit—the last 1/512th of a texel—will round up to the next higher integer
texel value.&lt;/p&gt;
&lt;figure class=&#34;not-too-wide only-light-theme&#34; alt=&#34;Rounding to the nearest fixed-point texture coordinate&#34; title=&#34;Rounding to the nearest fixed-point texture coordinate&#34; &gt;
&lt;img src=&#34;https://www.reedbeta.com/blog/texture-gathers-and-coordinate-precision/round-nearest.png&#34;/&gt;      &lt;figcaption&gt;&lt;p&gt;When fixed-point conversion is done by round-to-nearest, all the points in the yellow square end up rounded to one of the yellow dots, and assigned the corresponding set of texels; likewise the cyan ones.&lt;/p&gt;
&lt;p&gt;Note how the squares are offset from the texel centers by half the grid spacing.&lt;/p&gt;&lt;/figcaption&gt;
    &lt;/figure&gt;
&lt;figure class=&#34;not-too-wide only-dark-theme&#34; alt=&#34;Rounding to the nearest fixed-point texture coordinate&#34; title=&#34;Rounding to the nearest fixed-point texture coordinate&#34; &gt;
&lt;img src=&#34;https://www.reedbeta.com/blog/texture-gathers-and-coordinate-precision/round-nearest-dark.png&#34;/&gt;     &lt;figcaption&gt;&lt;p&gt;When fixed-point conversion is done by round-to-nearest, all the points in the yellow square end up rounded to one of the yellow dots, and assigned the corresponding set of texels; likewise the cyan ones.&lt;/p&gt;
&lt;p&gt;Note how the squares are offset from the texel centers by half the grid spacing.&lt;/p&gt;&lt;/figcaption&gt;
    &lt;/figure&gt;

&lt;p&gt;Therefore, in that last 1/512th, bilinear filtering operations and gathers will choose a one-higher
set of texels to interpolate between—while the shader computing &lt;code&gt;frac&lt;/code&gt; on the original float32
values will still think it’s in the original set of texels. This is exactly what we saw in
the original artifact!&lt;/p&gt;
&lt;p&gt;Accordingly, we can now see that the &lt;code&gt;frac&lt;/code&gt; input needs to be shifted by exactly 1/512th texel in
order to make its wrap point line up. It’s very much like the old C/C++ trick of adding 0.5 before
converting a float to integer, to obtain rounding instead of truncation.&lt;/p&gt;
&lt;div class=&#34;codehilite&#34;&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class=&#34;k&#34;&gt;const&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;kt&#34;&gt;float&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;offset&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;mf&#34;&gt;1.0&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;/&lt;/span&gt;&lt;span class=&#34;mf&#34;&gt;512.0&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;span class=&#34;kt&#34;&gt;float2&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;texelFrac&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;nb&#34;&gt;frac&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;uv&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;*&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;textureSize&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;+&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;-&lt;/span&gt;&lt;span class=&#34;mf&#34;&gt;0.5&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;+&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;offset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Lo and behold, the flickery lines on the shadow are now completely gone. 👌🎉😎&lt;/p&gt;
&lt;h2 id=&#34;eight-is-a-magic-number&#34;&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/texture-gathers-and-coordinate-precision/#eight-is-a-magic-number&#34; title=&#34;Permalink to this section&#34;&gt;Eight is a Magic Number&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;All GPUs that support D3D11—which means essentially all PC desktop/laptop GPUs from the last decade
and a half—should be compliant with the D3D spec, so they should all be rounding and converting their
texture coordinates the same way. Except that there’s still some wiggle room there: the
spec only prescribes 8 subtexel bits as a &lt;em&gt;minimum&lt;/em&gt;. GPU designers have the option to use &lt;em&gt;more&lt;/em&gt; than 8,
if they wish. How many bits do they actually use?&lt;/p&gt;
&lt;p&gt;Let’s see what Vulkan has to say about it. The Vulkan spec’s chapter
&lt;a href=&#34;https://www.khronos.org/registry/vulkan/specs/1.2-extensions/html/vkspec.html#textures&#34;&gt;§16 Image Operations&lt;/a&gt;
describes much the same operations as the D3D spec, but at a more abstract mathematical level—it
doesn’t nail down the exact sequence of operations and precision the way D3D does. In particular,
Vulkan doesn’t say what numeric format should be used for the &lt;code&gt;floor&lt;/code&gt; operation that extracts the
integer texel coordinates. However, it does say:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p class=&#34;attribution&#34;&gt;&lt;a href=&#34;https://www.khronos.org/registry/vulkan/specs/1.2-extensions/html/vkspec.html#_unnormalized_texel_coordinate_operations&#34;&gt;VK §16.6 Unnormalized Texel Coordinate Operations&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;…the number of fraction bits retained is specified by &lt;code&gt;VkPhysicalDeviceLimits::​subTexelPrecisionBits&lt;/code&gt;.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;So, Vulkan doesn’t out-and-out &lt;em&gt;say&lt;/em&gt; that texture coordinates should be converted to a fixed-point
format, but that seems to be implied or assumed, given the specification of a number of “fraction bits”
retained.&lt;/p&gt;
&lt;p&gt;Also, in Vulkan the number of subtexel bits can be queried in the physical device properties.
That means we can use Sascha Willems’ fantastic &lt;a href=&#34;http://vulkan.gpuinfo.org/&#34;&gt;Vulkan Hardware Database&lt;/a&gt;
to get an idea of what &lt;code&gt;subTexelPrecisionBits&lt;/code&gt; values &lt;a href=&#34;http://vulkan.gpuinfo.org/displaydevicelimit.php?name=subTexelPrecisionBits&#34;&gt;are reported for actual GPUs out there&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The results as of this writing show about 89% of devices returning 8, and the rest returning 4.
There are no devices returning more than 8.&lt;/p&gt;
&lt;figure class=&#34;max-width-50 invert-when-dark&#34; alt=&#34;Report on the distribution of subTexelPrecisionBits across Vulkan GPUs&#34; title=&#34;Report on the distribution of subTexelPrecisionBits across Vulkan GPUs&#34; &gt;
&lt;img src=&#34;https://www.reedbeta.com/blog/texture-gathers-and-coordinate-precision/subtexelprecisionbits.png&#34;/&gt;      &lt;figcaption&gt;&lt;p&gt;The distribution of &lt;code&gt;subTexelPrecisionBits&lt;/code&gt; as reported by the Vulkan Hardware Database. The reports of values 0 and 6 look bogus, as do most of the reports of 4.&lt;/p&gt;&lt;/figcaption&gt;
    &lt;/figure&gt;

&lt;p&gt;The Vulkan spec minimum for &lt;code&gt;subTexelPrecisionBits&lt;/code&gt; is also 4, not 8 (see
&lt;a href=&#34;https://www.khronos.org/registry/vulkan/specs/1.2-extensions/html/vkspec.html#limits-required&#34;&gt;Table 53 – Required Limits&lt;/a&gt;).
And it seems there’s a significant minority of GPUs that have only 4 subtexel bits. Or is there?
Let’s poke at that a little further.&lt;/p&gt;
&lt;p&gt;Of the reports that return 4 bits, a majority of them seem to be from Apple platforms.
Now, Apple doesn’t implement Vulkan directly, so these must be going through &lt;a href=&#34;https://github.com/KhronosGroup/MoltenVK&#34;&gt;MoltenVK&lt;/a&gt;.
And it turns out that MoltenVK
&lt;a href=&#34;https://github.com/KhronosGroup/MoltenVK/blob/9986e92f35d957e3760fa468a53ecad3c9b86478/MoltenVK/MoltenVK/GPUObjects/MVKDevice.mm#L2184-L2189&#34;&gt;hardcodes &lt;code&gt;subTexelPrecisionBits&lt;/code&gt; to 4&lt;/a&gt;,
at the time of this writing. The associated comment suggests that Metal doesn’t publicly
expose or specify this value, so they’re just setting it to the minimum. This value
shouldn’t be taken as meaningful!
In fact, I would bet money that all the Apple GPUs have 8 subtexel bits,
just like everyone else. (The only one I’ve tested directly is the M1, and it indeed seems to be 8.)
However, I don’t think there is any public documentation from Apple to confirm or refute this.&lt;/p&gt;
&lt;p&gt;Many other reports of 4 subtexel bits come from older Linux drivers for GPUs that definitely have 8
subtexel bits; those might also be incomplete Vulkan implementations, or some other odd
happenstance. Some Android GPUs also have both 4 and 8 reported in the database for the same GPU;
I assume 8 is the correct value for those. Finally, there are
software rasterizers such as SwiftShader and llvmpipe, which also seem to just return the spec minimum.&lt;/p&gt;
&lt;p&gt;The fact that the Vulkan spec minimum is 4, rather than 8, suggests that there are (or were) some GPUs
out there that actually only have 4 subtexel bits—or why wouldn’t the spec minimum be 8? But I
haven’t been able to find out what GPUs those could be.&lt;/p&gt;
&lt;p&gt;Moreover, there’s a very practical reason why 8 bits is the standard value!
Subtexel precision is directly related to bilinear filtering, and most textures in 3D apps are
in 8-bit-per-channel formats. If you’re going to interpolate 8-bit texture values and store them in
an 8-bit framebuffer, then you &lt;em&gt;need&lt;/em&gt; 8-bit subtexel precision; otherwise, you’re likely to see
banding whenever a texture is magnified—whenever the camera gets close to a surface. Lots of
effects like reflection cubemaps, skyboxes, and bloom filters would also be really messed up if you
had less than 8 subtexel bits!&lt;/p&gt;
&lt;p&gt;Overall, it seems very safe to assume that any GPU you’d actually want to run on will have exactly 8
bits of subtexel precision—no more, no less.&lt;/p&gt;
&lt;p&gt;What about the rounding mode? Unfortunately, as noted earlier, the Vulkan spec doesn’t actually say that
texture coordinates should be converted to fixed-point, and thus doesn’t specify rounding behavior
for that operation.&lt;/p&gt;
&lt;p&gt;Given that the D3D behavior is more tightly specified here, we can expect that behavior to hold
whenever we’re on a D3D-supporting GPU (even if we’re running with Vulkan or OpenGL on that GPU).
The question is a little trickier for other GPUs, such as Apple’s and the assorted mobile GPUs. They
don’t support D3D, so they’re under no obligation to follow D3D’s spec. That said, it seems
probable that they do also use round-to-nearest here, especially Apple. (I’d be a little more
hesitant to assume this across the board with the mobile crowd.)&lt;/p&gt;
&lt;p&gt;I can tell you that from my experiments, the 1/512 offset consistently fixes the gather mismatch
across all desktop GPU vendors, OSes, and APIs that I’ve been able to try, including Apple’s. However,
I haven’t had the chance to test this on mobile GPUs so far.&lt;/p&gt;
&lt;h2 id=&#34;interlude-nearest-filtering&#34;&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/texture-gathers-and-coordinate-precision/#interlude-nearest-filtering&#34; title=&#34;Permalink to this section&#34;&gt;Interlude: Nearest Filtering&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;I initially followed a bit of a red herring with this investigation. I wanted to verify whether the
1/512 offset was correct across a wider range of hardware, so I created a Shadertoy to test it, and
asked people to run it and let me know the results. (By the way, thanks, everyone!)&lt;/p&gt;
&lt;p&gt;The results I got were all over the place. For some GPU vendors an offset was required, and for
others, it wasn’t. In some cases, it seemed like it might have changed between different architectures
of the same vendor. There was even some evidence that it depended on which API you were using, with
D3D and OpenGL giving different results on the same GPU—although I wasn’t able to conclusively
verify that. Oh jeez. What the heck?&lt;/p&gt;
&lt;p&gt;As it turns out, I’d taken a shortcut that was actually kind of a long-cut. You see, Shadertoy is built on
WebGL, which doesn’t actually support texture gathers currently (they’re planned to be in the next
version of WebGL). So, I substituted with something that’s similar in many ways: nearest-neighbor
filtering mode.&lt;/p&gt;
&lt;p&gt;Just like gathers, nearest-neighbor filtering also has to select a texel based on the texture unit’s
judgement of which texel square your coordinates are in, and there is again the possibility of a
mismatch versus the shader’s version of the calculation. The only difference is that there isn’t a 0.5
texel offset—otherwise, I expected it to work the same way as a gather, using the same math and
rounding modes.&lt;/p&gt;
&lt;p&gt;Surprise! It doesn’t. The results of nearest-neighbor filtering suggest that GPUs aren’t consistent
in how they compute the nearest texel to the sample point. To find the nearest texel, we need to apply
&lt;code&gt;floor&lt;/code&gt; to the scaled texel coordinates; but it looks like some GPUs round off the coordinates to
8 subpixel bits before taking the &lt;code&gt;floor&lt;/code&gt;, and others might truncate instead of rounding—or they
might just be applying &lt;code&gt;floor&lt;/code&gt; to the floating-point value directly, rather than converting it to
fixed-point at all.&lt;/p&gt;
&lt;p&gt;Now, the D3D11 functional spec does say (&lt;a href=&#34;https://microsoft.github.io/DirectX-Specs/d3d/archive/D3D11_3_FunctionalSpec.htm#7.18.7%20Point%20Sample%20Addressing&#34;&gt;§7.18.7 Point Sample Addressing&lt;/a&gt;)
that point sampling (aka nearest filtering) is supposed to use the same fixed-point conversion and
rounding as in the bilinear case. And some GPUs out there are definitely in violation of that, to
the tune of 1/512th texel, unless I’ve misunderstood something!&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://www.shadertoy.com/view/flyGRd&#34;&gt;Here’s the Shadertoy&lt;/a&gt;, if you want to check it out (see the
code comments for an explanation).&lt;/p&gt;
&lt;p&gt;Happily, however, if you’re actually interested in gathers, the behavior of those appears to be
completely consistent. (Honestly, surprising for anything to do with GPU hardware!)&lt;/p&gt;
&lt;h2 id=&#34;conclusion&#34;&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/texture-gathers-and-coordinate-precision/#conclusion&#34; title=&#34;Permalink to this section&#34;&gt;Conclusion&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The inner workings of texture units are something we can usually gloss over as GPU programmers.
For the most part, once we’ve prepared the mipmaps and configured the sampler settings, things Just
Work™ and we don’t need to think about it a lot.&lt;/p&gt;
&lt;p&gt;Once in awhile, though, something comes along that brings the texture unit’s internal behavior to
the fore, and this was a great example. If you ever try to build a custom filter in a shader using
texture gathers, the mismatch in the texture unit’s internal precision versus the float32
calculations in the shader will create a very noticeable visual issue.&lt;/p&gt;
&lt;p&gt;Fortuitously, we were able to get a good read on what’s going on from a close
perusal of API specs, and hardware survey data plus a few directed tests helped to confirm that gathers
really do work the way it says in the spec, across a wide range of GPUs. And best of all, the fix is
simple and universal once we’ve understood the problem.&lt;/p&gt;</description>
			</item>
			<item>
				<title>git-partial-submodule</title>
				<link>https://www.reedbeta.com/made/git-partial-submodule/</link>
				<guid>https://www.reedbeta.com/made/git-partial-submodule/</guid>
				<dc:creator>Nathan Reed</dc:creator>
<pubDate>Sat, 04 Sep 2021 11:47:29 -0700</pubDate><comments>https://www.reedbeta.com/made/git-partial-submodule/#comments</comments>				<description>&lt;p&gt;&lt;a class=&#34;biglink&#34; href=&#34;https://github.com/Reedbeta/git-partial-submodule/&#34;&gt;View on GitHub&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Have you ever thought about adding a submodule to your git project, but you didn’t want to bear the
burden of downloading and storing the submodule’s entire history, or you only need a handful of
files out of the submodule?&lt;/p&gt;
&lt;p&gt;Git provides &lt;a href=&#34;https://github.blog/2020-12-21-get-up-to-speed-with-partial-clone-and-shallow-clone/&#34;&gt;partial clone&lt;/a&gt;
and &lt;a href=&#34;https://github.blog/2020-01-17-bring-your-monorepo-down-to-size-with-sparse-checkout/&#34;&gt;sparse checkout&lt;/a&gt;
features that can make this happen for top-level repositories, but so far they aren’t available for
submodules. That’s a hole I aimed to fill with this project. &lt;strong&gt;git-partial-submodule&lt;/strong&gt; is a tool for
setting up submodules with blobless clones. It can also save sparse-checkout patterns in your
&lt;code&gt;.gitmodules&lt;/code&gt; file, allowing them to be managed by version control, and automatically applied when
the submodules are cloned.&lt;/p&gt;
&lt;!--more--&gt;
&lt;p&gt;As a motivating example, a fresh clone of &lt;a href=&#34;https://github.com/ocornut/imgui&#34;&gt;Dear ImGui&lt;/a&gt; consumes
about 80 MB (of which 75 MB is in the &lt;code&gt;.git&lt;/code&gt; directory) and takes about 10 seconds to clone on a
fast connection. It also brings in roughly 200 files, including numerous examples and backends and
various other ancillary files. The actual ImGui implementation—the part you need for your app—is
in 11 files totaling 2.5 MB.&lt;/p&gt;
&lt;p&gt;In contrast, a blobless, sparse clone of Dear ImGui requires only about 7 MB (4.5 MB in the &lt;code&gt;.git&lt;/code&gt;
directory), takes ~2 seconds to clone, and checks out only the files you want.&lt;/p&gt;
&lt;p&gt;(This is not to pick on Dear ImGui at all! These issues arise with any healthy, long-lived project,
and the history bloat in particular is an artifact of git’s design.)&lt;/p&gt;
&lt;p&gt;One way developers might address this is by “vendoring”, or copying the ImGui files they need into
their own repository and checking them in. That can be a legitimate solution, but it has various
downsides.&lt;/p&gt;
&lt;p&gt;Another solution supported out of the box by git is “shallow” clones, which essentially only
download the latest commit and no history. Submodules can be configured to be cloned shallowly.
This works, and is useful in some cases such as cloning on a build machine where you’re not going to
be manipulating the repository at all. However, shallow clones make it difficult to do normal
development workflows with the submodule. In contrast, a blobless clone functions normally with
most workflows, as it can download missing data on demand.&lt;/p&gt;
&lt;p&gt;Since git’s own submodule commands do not (yet) allow specifying blobless mode or sparse checkout,
I built git-partial-submodule to work around this. It’s a single-file Python script that you use
just for the initial setup of submodules. Instead of &lt;code&gt;git submodule add&lt;/code&gt;, you do
&lt;code&gt;git-partial-submodule.py add&lt;/code&gt;. When cloning a repository with existing submodules, you use
&lt;code&gt;git-partial-submodule.py clone&lt;/code&gt; instead of recursively cloning or &lt;code&gt;git submodule update --init&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;It works by manually calling &lt;code&gt;git clone&lt;/code&gt; with the blobless/sparse options, setting up the submodule
repo in your &lt;code&gt;.git/modules&lt;/code&gt; directory, and hooking everything up so git sees it as a legit submodule.
Afterward, ordinary submodule operations such as fetches and updates &lt;em&gt;should&lt;/em&gt; work normally—although
I haven’t done super extensive testing on this, and I’ve been warned that blobless/sparse are still
experimental git features that may have sharp edges.&lt;/p&gt;
&lt;p&gt;The other thing git-partial-submodule does is to save and restore sparse-checkout patterns in your
&lt;code&gt;.gitmodules&lt;/code&gt; for each submodule. When you only need a subset of the submodule’s file tree, this
lets you manage those patterns under version control in the superproject, so that others who clone
the project (and are also using git-partial-submodule) will automatically get the right set of
files. You can configure this using the ordinary &lt;code&gt;git sparse-checkout&lt;/code&gt; commands, but currently you
have to remember to do the extra step of saving the patterns to &lt;code&gt;.gitmodules&lt;/code&gt; when changing them, or
restoring the patterns &lt;em&gt;from&lt;/em&gt; &lt;code&gt;.gitmodules&lt;/code&gt; after pulling/merging. This might be able to be
automated further using some git hooks, but I haven’t looked into it yet.&lt;/p&gt;
&lt;p&gt;I’m excited to try out this workflow for some of my own projects, replacing vendored projects with
partial submodules, and I hope it will be helpful to some others out there as well. Issues and PRs
are open on GitHub, and contributions are welcome. If you end up trying this, let me know if it
works for you!&lt;/p&gt;</description>
			</item>
			<item>
				<title>Slope Space in BRDF Theory</title>
				<link>https://www.reedbeta.com/blog/slope-space-in-brdf-theory/</link>
				<guid>https://www.reedbeta.com/blog/slope-space-in-brdf-theory/</guid>
				<dc:creator>Nathan Reed</dc:creator>
<pubDate>Fri, 16 Jul 2021 15:34:37 -0700</pubDate><comments>https://www.reedbeta.com/blog/slope-space-in-brdf-theory/#comments</comments>					<category>Graphics</category>
					<category>Math</category>
				<description>&lt;p&gt;When you read BRDF theory papers, you’ll often see mention of &lt;em&gt;slope space&lt;/em&gt;. Sometimes, components
of the BRDF such as NDFs or masking-shadowing functions are defined in slope space, or operations
are done in slope space before being converted back to ordinary vectors or polar coordinates.
However, the meaning and intuition of slope space is rarely explained. Since it may not be obvious
exactly what slope space is, why it is useful, or how to transform things to and from it, I thought
I would write down a gentler introduction to it. &lt;!--more--&gt;&lt;/p&gt;
&lt;div class=&#34;toc&#34;&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/slope-space-in-brdf-theory/#slope-refresher&#34;&gt;Slope Refresher&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/slope-space-in-brdf-theory/#normals-and-slopes&#34;&gt;Normals and Slopes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/slope-space-in-brdf-theory/#slope-space&#34;&gt;Slope Space&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/slope-space-in-brdf-theory/#converting-to-polar-coordinates&#34;&gt;Converting to Polar Coordinates&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/slope-space-in-brdf-theory/#properties-of-slope-space&#34;&gt;Properties of Slope Space&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/slope-space-in-brdf-theory/#distributions-in-slope-space&#34;&gt;Distributions in Slope Space&lt;/a&gt;&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/slope-space-in-brdf-theory/#the-jacobian&#34;&gt;The Jacobian&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/slope-space-in-brdf-theory/#some-common-distributions&#34;&gt;Some Common Distributions&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/slope-space-in-brdf-theory/#conclusion&#34;&gt;Conclusion&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;h2 id=&#34;slope-refresher&#34;&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/slope-space-in-brdf-theory/#slope-refresher&#34; title=&#34;Permalink to this section&#34;&gt;Slope Refresher&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;First off, what even is this “slope” thing we’re talking about? If you think back to your high school
algebra class, the slope of a line was defined as “rise over run”, or the ratio $\Delta y / \Delta x$
between some two points on the line.&lt;/p&gt;
&lt;p&gt;&lt;img alt=&#34;Slope of a line&#34; class=&#34;invert-when-dark&#34; src=&#34;https://www.reedbeta.com/blog/slope-space-in-brdf-theory/line-slope.png&#34; style=&#34;max-height:16em&#34; title=&#34;Slope of a line&#34; /&gt;&lt;/p&gt;
&lt;p&gt;The steeper the line, the larger the magnitude of its slope. The sign of the slope indicates which
direction the line is sloping in. The slope is infinite if the line is vertical.&lt;/p&gt;
&lt;p&gt;The concept of slope can readily be generalized to planes as well as lines. Planes have &lt;em&gt;two&lt;/em&gt; slopes,
one for $\Delta z / \Delta x$ and one for $\Delta z / \Delta y$ (using $z$-up coordinates, and
assuming the surface is not vertical):&lt;/p&gt;
&lt;p&gt;&lt;img alt=&#34;Slopes of a plane&#34; class=&#34;invert-when-dark&#34; src=&#34;https://www.reedbeta.com/blog/slope-space-in-brdf-theory/plane-slope.png&#34; style=&#34;max-height:26em&#34; title=&#34;Slopes of a plane&#34; /&gt;&lt;/p&gt;
&lt;p&gt;These values describe how much the surface rises or falls in $z$ if you take a step along either
$x$ or $y$. This completely specifies the orientation of a planar surface, as steps in any other
direction can be derived from the $x$ and $y$ slopes.&lt;/p&gt;
&lt;p&gt;In calculus, the slope of a line is generalized to the derivative or “instantaneous slope” of a curve,
$\mathrm{d}y/\mathrm{d}x$. For curved surfaces, so long as they can be expressed as a heightfield
(where $z$ is a function of $x, y$), slopes become partial derivatives $\partial z / \partial x$ and
$\partial z / \partial y$.&lt;/p&gt;
&lt;p&gt;It’s worth noting that slopes are completely &lt;em&gt;coordinate-dependent&lt;/em&gt; quantities. If you transform
to a different coordinate system, the slopes of $z$ with respect to $x, y$ will be totally different
values, or even infinite (if the surface is not a heightfield anymore in the new coordinates).&lt;/p&gt;
&lt;h2 id=&#34;normals-and-slopes&#34;&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/slope-space-in-brdf-theory/#normals-and-slopes&#34; title=&#34;Permalink to this section&#34;&gt;Normals and Slopes&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;We usually describe surfaces in 3D by their normal vector rather than their slopes, as the normal is
able to gracefully handle surfaces in any orientation without infinities, and is easier to transform
into different coordinate systems. However, there is a simple relationship between a surface’s
normal and its slopes, as this diagram should hopefully convince you:&lt;/p&gt;
&lt;p&gt;&lt;img alt=&#34;Normal vector compared with slope&#34; class=&#34;invert-when-dark&#34; src=&#34;https://www.reedbeta.com/blog/slope-space-in-brdf-theory/normal-slope.png&#34; style=&#34;max-height:20em&#34; title=&#34;Normal vector compared with slope&#34; /&gt;&lt;/p&gt;
&lt;p&gt;The two triangles with the dotted lines in the figure are congruent (same angles and sizes), but
rotated by 90 degrees. As the normal is, by definition, perpendicular to the surface, the normal’s
components have the same proportionality as coordinate deltas along the surface, just swapped around.
This diagram shows the $xz$ projection, but the same holds true of the $yz$ components:
$$
\begin{aligned}
    \frac{\Delta z}{\Delta x} &amp;amp;= -\frac{\mathbf{n}_x}{\mathbf{n}_z} \\[1em]
    \frac{\Delta z}{\Delta y} &amp;amp;= -\frac{\mathbf{n}_y}{\mathbf{n}_z}
\end{aligned}
$$
The negative sign is because $\Delta z$ is going down while $\mathbf{n}_z$ is going up (or vice
versa, depending on the orientation).&lt;/p&gt;
&lt;p&gt;Just for completeness, when you have a heightfield surface $z(x, y)$, the partial derivatives are
related to its normal at a point in the same way:
$$
\begin{aligned}
    \frac{\partial z}{\partial x} &amp;amp;= -\frac{\mathbf{n}_x}{\mathbf{n}_z} \\[1em]
    \frac{\partial z}{\partial y} &amp;amp;= -\frac{\mathbf{n}_y}{\mathbf{n}_z}
\end{aligned}
$$&lt;/p&gt;
&lt;h2 id=&#34;slope-space&#34;&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/slope-space-in-brdf-theory/#slope-space&#34; title=&#34;Permalink to this section&#34;&gt;Slope Space&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Now we’re finally ready to define slope space. Due to the relationship between slopes and normal
vectors, slopes act as an alternate parameterization of unit vectors in the $z &amp;gt; 0$ hemisphere.
Given any vector, we can treat it as a normal and find the slopes of a surface perpendicular to it.
“Slope space” refers to this domain: the 2D space of all the possible slope values. As slopes can be
any real numbers, slope space is just the real plane, $\mathbb{R}^2$, but with a special meaning.&lt;/p&gt;
&lt;p&gt;A good way to visualize slope space is to identify it with the plane $z = 1$. Then, vectors at the
origin can be converted to slope space by intersecting them with the plane:&lt;/p&gt;
&lt;p&gt;&lt;img alt=&#34;Slope space as the z=1 plane&#34; class=&#34;invert-when-dark&#34; src=&#34;https://www.reedbeta.com/blog/slope-space-in-brdf-theory/normal-z-1.png&#34; style=&#34;max-height:16em&#34; title=&#34;Slope space as the z=1 plane&#34; /&gt;&lt;/p&gt;
&lt;p&gt;Here I’ve introduced the notation $\tilde{\mathbf{n}}$ for the 2D vector in slope space corresponding
to the 3D vector $\mathbf{n}$. The tilde ($\sim$) notation for slope-space quantities is commonly
used in the BRDF literature, and I’ll follow it here.&lt;/p&gt;
&lt;p&gt;Intersecting a ray with the $z = 1$ plane is equivalent to rescaling the vector so that $\mathbf{n}_z = 1$,
and then the slopes can be read off as the negated $x, y$ components of the rescaled vector. You can
visualize the slope plane as having inverted $x, y$ axes compared to the base coordinates to take
care of this. (Note the $x$-axis on the slope plane, pointing to the left, in the diagram above.)&lt;/p&gt;
&lt;p&gt;So, you can picture the hemisphere being blown up and stretched onto the plane, by projecting each
point away from the origin until it hits the plane. This establishes a bijection (one-to-one mapping)
between the unit vectors with $z &amp;gt; 0$ and points on the plane.&lt;/p&gt;
&lt;p&gt;To make it official, the slope-space parameterization of an arbitrary vector $\mathbf{v}$ with
$\mathbf{v}_z &amp;gt; 0$ is defined by:
$$
\begin{aligned}
    \tilde{\mathbf{v}}_x &amp;amp;= -\frac{\mathbf{v}_x}{\mathbf{v}_z} \\[1em]
    \tilde{\mathbf{v}}_y &amp;amp;= -\frac{\mathbf{v}_y}{\mathbf{v}_z}
\end{aligned}
$$
This assumes that the vector is upward-pointing, so that $\mathbf{v}_z &amp;gt; 0$. Finite slopes cannot
represent horizontal vectors (normal to vertical surfaces), and they cannot distinguish between
upward- and downward-pointing vectors, as slopes have no sense of orientation—reverse the normal,
and you still get the same slopes.&lt;/p&gt;
&lt;p&gt;Converting back from slopes to an ordinary unit normal vector is also simple:
$$
    \mathbf{v} = \text{normalize}(-\tilde{\mathbf{v}}_x, -\tilde{\mathbf{v}}_y, 1)
$$&lt;/p&gt;
&lt;h2 id=&#34;converting-to-polar-coordinates&#34;&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/slope-space-in-brdf-theory/#converting-to-polar-coordinates&#34; title=&#34;Permalink to this section&#34;&gt;Converting to Polar Coordinates&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Another common parameterization of unit vectors is the polar coordinates $\theta, \phi$.
It’s straightforward to work out the direct conversion between slope space and polar coordinates.&lt;/p&gt;
&lt;p&gt;Following common conventions, we define the polar coordinates so that $\theta$ measures downward
from the $+z$ axis, and $\phi$ measures counterclockwise from the $+x$ axis. The conversion between
polar and 3D unit vectors is:
$$
\begin{aligned}
    \theta &amp;amp;= \text{acos}(z) \\
    \phi &amp;amp;= \text{atan2}(y, x)
\end{aligned}
\qquad
\begin{aligned}
    x &amp;amp;= \sin\theta \cos\phi \\
    y &amp;amp;= \sin\theta \sin\phi \\
    z &amp;amp;= \cos\theta
\end{aligned}
$$
and the conversion between polar and slope space is:
$$
\begin{aligned}
    \theta &amp;amp;= \text{atan}(\sqrt{\tilde x^2 + \tilde y^2}) \\
    \phi &amp;amp;= \text{atan2}(-\tilde y, -\tilde x)
\end{aligned}
\qquad
\begin{aligned}
    \tilde x &amp;amp;= -\!\tan\theta \cos\phi \\
    \tilde y &amp;amp;= -\!\tan\theta \sin\phi \\
\end{aligned}
$$
This can be derived by setting $\tilde x = -x/z$ and substituting the conversion from polar, then
using the identity $\sin/\cos = \tan$.&lt;/p&gt;
&lt;p&gt;A fact worth noting here is that the magnitude of a slope-space vector, $|\tilde{\mathbf{v}}|$, is
equal to $\tan\theta_\mathbf{v}$.&lt;/p&gt;
&lt;h2 id=&#34;properties-of-slope-space&#34;&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/slope-space-in-brdf-theory/#properties-of-slope-space&#34; title=&#34;Permalink to this section&#34;&gt;Properties of Slope Space&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Now we’ve seen how to define slope space and convert back and forth from it. But why is it useful?
Why would we want to represent vectors or functions in this way?&lt;/p&gt;
&lt;p&gt;In microfacet BRDF theory, we usually assume the microsurface is a heightfield for simplicity (which
is a pretty reasonable assumption for a lot of everyday materials). If the microsurface is a
heightfield, then its normals are constrained to the upper hemisphere. Slope space, which
parameterizes exactly the upper hemisphere, is a good match for this.&lt;/p&gt;
&lt;p&gt;From a performance perspective, slope space is also much cheaper to transform to and from than polar
coordinates, which makes it nicer to use in shaders. It requires only some divides or a normalize,
as opposed to a bunch of forward or inverse trigonometric functions.&lt;/p&gt;
&lt;p&gt;Slope space also has no boundaries, in contrast to other representations of unit vectors. The origin
(0, 0) of the slope plane represents a flat surface normal, and the farther away you get, the more
extreme the slope, but you can’t make the surface turn upside down or produce an invalid normal. So,
you can freely do various manipulations on vectors in slope space without worrying about exceeding
any bounds.&lt;/p&gt;
&lt;p&gt;Another useful fact about slope space is that many linear transformations of a surface, such as
scaling or shearing, map to transformations of its slope space in simple ways. For example, scaling
a surface by a factor $\alpha$ along its $z$-axis causes its normal vectors’ $z$-components to scale
by $1/\alpha$ (due to normals taking the inverse transpose), but then since $\mathbf{n}_z$ is in the
denominator in the definition of slope space, we have that the slopes of the surface are scaled by
$\alpha$.&lt;/p&gt;
&lt;p&gt;Here’s a table of how transformations of the microsurface map to transformations of slope space:&lt;/p&gt;
&lt;table&gt;
    &lt;thead&gt;
        &lt;tr&gt;
            &lt;th&gt;Surface&lt;/th&gt;
            &lt;th&gt;Slope Space&lt;/th&gt;
        &lt;/tr&gt;
    &lt;/thead&gt;
    &lt;tbody&gt;
        &lt;tr&gt;
            &lt;td&gt;Horizontal scale by $(\alpha_x, \alpha_y)$&lt;/td&gt;
            &lt;td&gt;Scale by $(1/\alpha_x, 1/\alpha_y)$&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td&gt;Vertical scale by $\alpha$&lt;/td&gt;
            &lt;td&gt;Scale by $\alpha$&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td&gt;Horizontal rotate ($xy$) by $\theta$&lt;/td&gt;
            &lt;td&gt;Rotate by $\theta$&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td&gt;Vertical rotate ($xz, yz$)&lt;/td&gt;
            &lt;td&gt;Projective transform&lt;br/&gt;&lt;em&gt;(not recommended)&lt;/em&gt;&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td&gt;Horizontal shear ($xy$) by
                $\begin{bmatrix}
                    1 &amp; k_2 \\
                    k_1 &amp; 1
                \end{bmatrix}$
            &lt;/td&gt;
            &lt;td&gt;Shear by
                $\begin{bmatrix}
                    1 &amp; -k_1 \\
                    -k_2 &amp; 1
                \end{bmatrix}$
            &lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td&gt;Vertical shear by
                $\begin{bmatrix}
                    1 &amp; 0 &amp; 0 \\
                    0 &amp; 1 &amp; 0 \\
                    k_x &amp; k_y &amp; 1
                \end{bmatrix}$
            &lt;/td&gt;
            &lt;td&gt;Translate by $(k_x, k_y)$&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td&gt;Vertical shear by
                $\begin{bmatrix}
                    1 &amp; 0 &amp; k_x \\
                    0 &amp; 1 &amp; k_y \\
                    0 &amp; 0 &amp; 1
                \end{bmatrix}$
            &lt;/td&gt;
            &lt;td&gt;Projective transform&lt;br/&gt;&lt;em&gt;(not recommended)&lt;/em&gt;&lt;/td&gt;
        &lt;/tr&gt;
    &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;These transformations in slope space are often exploited by parameterized BRDF models; they can
implement roughness, anisotropy, and such as transformations applied to a single canonical BRDF
(see for example &lt;a href=&#34;http://jcgt.org/published/0003/02/03/&#34;&gt;Heitz 2014&lt;/a&gt;, section 5).&lt;/p&gt;
&lt;h2 id=&#34;distributions-in-slope-space&#34;&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/slope-space-in-brdf-theory/#distributions-in-slope-space&#34; title=&#34;Permalink to this section&#34;&gt;Distributions in Slope Space&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;One of the key ingredients in a microfacet BRDF is its normal distribution function (NDF), and one
of the key uses for slope space is defining NDFs. Because slope space is an unbounded 2D plane, we
can import existing 1D or 2D distribution functions and manipulate them in various ways, just as we
would in any 2D domain. As long as we end up with a valid, normalized probability distribution in
the slope plane (sometimes called a slope distribution function, or a $P^{22}$ function—I’m not
sure where the latter term comes from), we can transform it to a properly normalized NDF expressed in
polar or vector form. Let’s see how to do that.&lt;/p&gt;
&lt;h3 id=&#34;the-jacobian&#34;&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/slope-space-in-brdf-theory/#the-jacobian&#34; title=&#34;Permalink to this section&#34;&gt;The Jacobian&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;When mapping distribution functions from one space to another, it’s important to remember that the
values of these functions are not dimensionless numbers; they are &lt;em&gt;densities&lt;/em&gt; with respect to the area
or volume measure of the underlying space. Therefore, it’s not enough just to change variables to
express the function in the new coordinates; you also have to correct for the way the mapping
stretches or squeezes the volume, which can vary from place to place.&lt;/p&gt;
&lt;p&gt;Symbolically, suppose we have a domain $A$ with a probability density $p(a)$ defined on it. We want
to map this to a domain $B$ parameterized by some new coordinates $b$. What we want is &lt;em&gt;not&lt;/em&gt; just
$p(a) = p(b)$ when $a \mapsto b$ under the mapping. Rather, we need to maintain:
$$
    p(a) \, \mathrm{d}A = p(b) \, \mathrm{d}B
$$
where $\mathrm{d}A, \mathrm{d}B$ are matching volume elements of the respective spaces, with
$\mathrm{d}A \mapsto \mathrm{d}B$ under the mapping we’re using. This says that the amount of
probability (or whatever thing whose density we’re measuring) in the infinitesimal volume $\mathrm{d}A$
is conserved under the mapping; the same amount of probability is present in $\mathrm{d}B$.&lt;/p&gt;
&lt;p&gt;This equation can be rewritten:
$$
    p(b) = p(a) \frac{\mathrm{d}A}{\mathrm{d}B}
$$
The factor $\mathrm{d}A / \mathrm{d}B$ here is called the Jacobian, referring to the determinant of
the &lt;a href=&#34;https://en.wikipedia.org/wiki/Jacobian_matrix_and_determinant&#34;&gt;Jacobian matrix&lt;/a&gt; which contains
all the derivatives of the change of variables from $a$ to $b$. Actually, this is the &lt;em&gt;inverse&lt;/em&gt;
Jacobian, as the forward Jacobian for $A \to B$ would be $\mathrm{d}B / \mathrm{d}A$. The forward
Jacobian is the factor by which the mapping stretches or squeezes volumes locally around a point.
Because a probability density has volume in the denominator, it transforms using the inverse Jacobian.&lt;/p&gt;
&lt;p&gt;So, when converting a slope-space distribution to an NDF, we have to multiply by the appropriate
Jacobian. But how do we find out what that is? First off, we have to recall that NDFs are defined
not as a density over solid angle in the hemisphere, but
&lt;a href=&#34;/blog/hows-the-ndf-really-defined/&#34;&gt;as a density over projected area on the $xy$ plane&lt;/a&gt;.
Thus, it’s not enough to just find the Jacobian from slope space to polar coordinates; we also need
to find the Jacobian from polar coordinates to projected area.&lt;/p&gt;
&lt;p&gt;To do this, I find it easiest to use the formalism of &lt;a href=&#34;https://en.wikipedia.org/wiki/Differential_form&#34;&gt;differential forms&lt;/a&gt;.
Explaining how those work is out of the scope of this article, but
&lt;a href=&#34;https://www.math.purdue.edu/~arapura/preprints/diffforms.pdf&#34;&gt;here’s an exposition I found useful&lt;/a&gt;.
They’re essentially fields of &lt;a href=&#34;/blog/normals-inverse-transpose-part-3/&#34;&gt;dual $k$-vectors&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;First, we can write down the $xy$ projected area element, $\mathrm{d}x \wedge \mathrm{d}y$, in terms
of polar coordinates by differentiating the mapping from polar to Cartesian, which I’ll repeat here
for convenience:
$$
\begin{gathered}
    \left\{
    \begin{aligned}
        x &amp;amp;= \sin\theta \cos\phi \\
        y &amp;amp;= \sin\theta \sin\phi \\
        z &amp;amp;= \cos\theta
    \end{aligned}
    \right. \\[2em]
    \begin{aligned}
        \mathrm{d}x \wedge \mathrm{d}y
        &amp;amp;= (\cos\theta\cos\phi\,\mathrm{d}\theta - \sin\theta\sin\phi\,\mathrm{d}\phi) \ \wedge \\
            &amp;amp;\qquad (\cos\theta\sin\phi\,\mathrm{d}\theta + \sin\theta\cos\phi\,\mathrm{d}\phi) \\[0.5em]
        &amp;amp;= \cos\theta\sin\theta\cos^2\phi\,(\mathrm{d}\theta \wedge \mathrm{d}\phi) \ - \\
            &amp;amp;\qquad \cos\theta\sin\theta\sin^2\phi\,(\mathrm{d}\phi \wedge \mathrm{d}\theta) \\[0.5em]
        &amp;amp;= \cos\theta\sin\theta\,(\mathrm{d}\theta \wedge \mathrm{d}\phi)
    \end{aligned}
\end{gathered}
$$
Then, we can do the same thing with the slope-space area element:
$$
\begin{gathered}
    \left\{
    \begin{aligned}
    \tilde x &amp;amp;= -\!\tan\theta \cos\phi \\
    \tilde y &amp;amp;= -\!\tan\theta \sin\phi \\
    \end{aligned}
    \right. \\[1.5em]
    \begin{aligned}
        \mathrm{d}\tilde x \wedge \mathrm{d} \tilde y
        &amp;amp;= -(\cos^{-2}\theta\cos\phi\,\mathrm{d}\theta - \tan\theta\sin\phi\,\mathrm{d}\phi) \ \wedge \\
            &amp;amp;\qquad -(\cos^{-2}\theta\sin\phi\,\mathrm{d}\theta + \tan\theta\cos\phi\,\mathrm{d}\phi) \\[0.5em]
        &amp;amp;= \tan\theta\cos^{-2}\theta\cos^2\phi\,(\mathrm{d}\theta \wedge \mathrm{d}\phi) \ - \\
            &amp;amp;\qquad \tan\theta\cos^{-2}\theta\sin^2\phi\,(\mathrm{d}\phi \wedge \mathrm{d}\theta) \\[0.5em]
        &amp;amp;= \frac{\tan\theta}{\cos^2\theta} \, (\mathrm{d}\theta \wedge \mathrm{d}\phi)
    \end{aligned}
\end{gathered}
$$
Now, all we have to do is divide:
$$
\begin{aligned}
    \frac{\mathrm{d}\tilde x \wedge \mathrm{d} \tilde y}{\mathrm{d}x \wedge \mathrm{d}y} &amp;amp;=
        \frac{\tan\theta}{\cos^2\theta} \frac{1}{\cos\theta\sin\theta} \\[1em]
    &amp;amp;= \frac{1}{\cos^4\theta}
\end{aligned}
$$
Et voilà! The Jacobian for converting densities from slope space to NDF form is $1/\cos^4\theta$.
We’ll have to multiply by this factor in addition to changing variables.&lt;/p&gt;
&lt;h3 id=&#34;some-common-distributions&#34;&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/slope-space-in-brdf-theory/#some-common-distributions&#34; title=&#34;Permalink to this section&#34;&gt;Some Common Distributions&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;As an example of the conversion from slope space to NDF, let’s take the standard (bivariate)
Gaussian distribution defined on slope space:
$$
    D(\tilde{\mathbf{m}}, \sigma) = \frac{1}{2\pi\sigma^2} \exp\left(-\frac{|\tilde{\mathbf{m}}|^2}{2\sigma^2}\right)
$$
To turn this into an NDF, we need to change variables from $\tilde{\mathbf{m}}$ to $(\theta_\mathbf{m}, \phi_\mathbf{m})$,
and also multiply by the Jacobian $1/\cos^4\theta_\mathbf{m}$. Recalling that $|\tilde{\mathbf{m}}| = \tan\theta_\mathbf{m}$,
this becomes:
$$
    D(\mathbf{m}, \sigma) = \frac{1}{2\pi\sigma^2\cos^4\theta_\mathbf{m}} \exp\left(-\frac{\tan^2\theta_\mathbf{m}}{2\sigma^2}\right)
$$
Hey, that looks familiar—it’s the Beckmann NDF! (Although it’s more usually seen with the roughness
parameter $\alpha = \sqrt{2}\sigma$.) The Beckmann distribution is a Gaussian in slope space.&lt;/p&gt;
&lt;p&gt;The isotropic GGX NDF (&lt;a href=&#34;https://www.cs.cornell.edu/~srm/publications/EGSR07-btdf.pdf&#34;&gt;Walter et al 2007&lt;/a&gt;)
looks like this:
$$
    D(\mathbf{m}, \alpha) = \frac{\alpha^2}{\pi \cos^4\theta_\mathbf{m} \bigl(\alpha^2 + \tan^2\theta_\mathbf{m} \bigr)^2 }
$$
You might now recognize those familiar-looking $\cos^4\theta_\mathbf{m}$ and $\tan\theta_\mathbf{m}$
factors. Yep, this NDF is also a convert from slope space! Working backwards, we can see that it was
originally:
$$
    D(\tilde{\mathbf{m}}, \alpha) = \frac{\alpha^2}{\pi \bigl(\alpha^2 + |\tilde{\mathbf{m}}|^2 \bigr)^2 }
$$
Although this formula is probably less familiar, it matches the pdf of the bivariate
&lt;a href=&#34;https://en.wikipedia.org/wiki/Multivariate_t-distribution&#34;&gt;Student’s &lt;span style=&#34;white-space:nowrap&#34;&gt;$t$-distribution&lt;/span&gt;&lt;/a&gt; with the
“normality” parameter $\nu$ set to 2, and scaled by $\alpha/\sqrt{2}$. (You can also create a family of NDFs
that interpolate between GGX and Beckmann, by exposing a user parameter that controls $\nu$; see
&lt;a href=&#34;https://mribar03.bitbucket.io/projects/eg_2017/distribution.pdf&#34;&gt;Ribardière et al 2017&lt;/a&gt;.)&lt;/p&gt;
&lt;p&gt;(Incidentally, the GGX NDF is often seen written in this alternate form:
$$
    D(\mathbf{m}, \alpha) = \frac{\alpha^2}{\pi \bigl( (\alpha^2 - 1)\cos^2\theta_\mathbf{m} + 1 \bigr)^2 }
$$
This is the same function as the form above (which is from the original GGX paper), but rearranged
to make it cheaper to evaluate, as it eliminates the $\tan^2$ using the identity
&lt;span style=&#34;white-space:nowrap&#34;&gt;$\tan^2 = (1 - \cos^2)/\cos^2$&lt;/span&gt;. However, this form also
introduces numerical precision problems, and &lt;a href=&#34;https://github.com/google/filament&#34;&gt;Filament&lt;/a&gt; has a
&lt;a href=&#34;https://google.github.io/filament/Filament.html#materialsystem/specularbrdf/normaldistributionfunction(speculard)&#34;&gt;numerically stable form&lt;/a&gt;:
$$
    D(\mathbf{m}, \alpha) = \frac{\alpha^2}{\pi \bigl(\alpha^2 \cos^2\theta_\mathbf{m} + \sin^2\theta_\mathbf{m} \bigr)^2 }
$$
which is &lt;em&gt;again&lt;/em&gt; the same function, rearranged some more; you’re meant to calculate $\sin^2\theta_\mathbf{m}$
as the squared magnitude of the cross product $|\mathbf{n} \times \mathbf{m}|^2$. This has nothing to
do with slope space; I just thought it was neat and worth knowing.)&lt;/p&gt;
&lt;h2 id=&#34;conclusion&#34;&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/slope-space-in-brdf-theory/#conclusion&#34; title=&#34;Permalink to this section&#34;&gt;Conclusion&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;To recap, the most important thing to take away about slope space is that it provides an alternate
representation for unit vectors in the upper hemisphere, by projecting them out onto an infinite
plane. This enables us to work with distributions in plain old 2D space, and then map them back into
functions on the hemisphere. Slope space also provides convenient mappings from some linear
transformations of the microsurface to linear or affine transformations in the slope plane.&lt;/p&gt;
&lt;p&gt;I hope this has demystified the concept of slope space a little bit, and now you won’t be confused
by it anymore when reading BRDF papers! 😄&lt;/p&gt;</description>
			</item>
			<item>
				<title>Hash Functions for GPU Rendering</title>
				<link>https://www.reedbeta.com/blog/hash-functions-for-gpu-rendering/</link>
				<guid>https://www.reedbeta.com/blog/hash-functions-for-gpu-rendering/</guid>
				<dc:creator>Nathan Reed</dc:creator>
<pubDate>Fri, 21 May 2021 17:52:07 -0700</pubDate><comments>https://www.reedbeta.com/blog/hash-functions-for-gpu-rendering/#comments</comments>					<category>Coding</category>
					<category>GPU</category>
					<category>Graphics</category>
				<description>&lt;p&gt;Back in 2013, I wrote a &lt;a href=&#34;/blog/quick-and-easy-gpu-random-numbers-in-d3d11/&#34;&gt;somewhat popular article&lt;/a&gt;
about pseudorandom number generation on the GPU. In the eight years since, a number of new PRNGs and
hash functions have been developed; and a few months ago, an excellent paper on the topic appeared
in JCGT: &lt;a href=&#34;http://jcgt.org/published/0009/03/02/&#34;&gt;Hash Functions for GPU Rendering&lt;/a&gt;, by Mark Jarzynski
and Marc Olano. I thought it was time to update my former post in light of this paper’s findings.&lt;/p&gt;
&lt;!--more--&gt;
&lt;p&gt;Jarzynski and Olano’s paper compares GPU implementations of a large number of different hash functions
along dual axes of performance (measured by time to render a quad evaluating the hash at each pixel)
and statistical quality (quantified by the count of failures of
&lt;a href=&#34;https://en.wikipedia.org/wiki/TestU01&#34;&gt;TESTU01 “Big Crush”&lt;/a&gt; tests). Naturally, there is quite a
spread of results in both performance and quality. Jarzynski and Olano then identify the few hash
functions that lie along the Pareto frontier—meaning they are the best choices along the whole
spectrum of performance/quality trade-offs.&lt;/p&gt;
&lt;p&gt;When choosing a hash function, we might sometimes prioritize performance, and other times might
prefer to sacrifice performance in favor of higher quality (real-time versus offline applications,
for example). The Pareto frontier provides the set of optimal choices for any point along that
balance—ranging from LCGs at the extreme performance-oriented end, to some quite expensive but
very high-quality hashes at the other end.&lt;/p&gt;
&lt;p&gt;In my 2013 article, I recommended the “Wang hash” as a general-purpose 32-bit-to-32-bit integer hash
function. The Wang hash was among those tested by Jarzynski and Olano, but unfortunately it did not
lie along the Pareto frontier—not even close! The solution that dominates it—and one of the best
balanced choices between performance and quality overall—is &lt;strong&gt;PCG&lt;/strong&gt;. In particular, the 32-bit PCG
hash used by Jarzynski and Olano goes as follows:&lt;/p&gt;
&lt;div class=&#34;codehilite&#34;&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class=&#34;kt&#34;&gt;uint&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;pcg_hash&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;kt&#34;&gt;uint&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;input&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;kt&#34;&gt;uint&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;state&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;input&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;*&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;747796405&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;u&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;+&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;2891336453&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;u&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;kt&#34;&gt;uint&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;word&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;((&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;state&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;&amp;gt;&amp;gt;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;((&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;state&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;&amp;gt;&amp;gt;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;28&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;u&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;+&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;4&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;u&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;))&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;^&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;state&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;*&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;277803737&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;u&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;return&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;word&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;&amp;gt;&amp;gt;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;22&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;u&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;^&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;word&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This has slightly better performance and &lt;em&gt;much&lt;/em&gt; better statistical quality than the Wang hash. It’s
fast enough to be useful for real-time, while also being high-quality enough for almost any graphics
use-case (if you’re not using precomputed blue noise, or low-discrepancy sequences). It should
probably be your default GPU hash function.&lt;/p&gt;
&lt;p&gt;Just to prove it works, here’s the bit pattern generated by a few thousand invocations of the above
function on consecutive inputs:&lt;/p&gt;
&lt;p&gt;&lt;img alt=&#34;Bit pattern generated by PCG hash&#34; src=&#34;https://www.reedbeta.com/blog/hash-functions-for-gpu-rendering/pcg.png&#34; title=&#34;Bit pattern generated by PCG hash&#34; /&gt;&lt;/p&gt;
&lt;p&gt;Yep, looks random! 👍&lt;/p&gt;
&lt;h2 id=&#34;pcg-variants&#34;&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/hash-functions-for-gpu-rendering/#pcg-variants&#34; title=&#34;Permalink to this section&#34;&gt;PCG Variants&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Incidentally, you might notice that the PCG function posted above doesn’t match that found in other
sources, such as the &lt;a href=&#34;https://www.pcg-random.org/download.html&#34;&gt;minimal C implementation on the PCG website&lt;/a&gt;.
This is because “PCG” isn’t a single function, but more of a recipe for constructing PRNG functions.
It works by starting with an LCG, and then applying a permutation function to mix around the bits
and improve the quality of the results. There many possible permutation functions, and
&lt;a href=&#34;https://www.pcg-random.org/pdf/hmc-cs-2014-0905.pdf&#34;&gt;O’Neill’s original PCG paper&lt;/a&gt;
provides a set of building blocks that can be combined in various ways to get generators with
different characteristics. In particular, the PCG used by Jarzynski and Olano corresponds to the
32-bit “RXS-M-XS” variant described in §6.3.4 of O’Neill. (See also the list of variants on
&lt;a href=&#34;https://en.wikipedia.org/wiki/Permuted_congruential_generator#Variants&#34;&gt;Wikipedia&lt;/a&gt;).&lt;/p&gt;
&lt;h2 id=&#34;hash-or-prng&#34;&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/hash-functions-for-gpu-rendering/#hash-or-prng&#34; title=&#34;Permalink to this section&#34;&gt;Hash or PRNG?&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;One of the main points I discussed in my 2013 article was the distinction between PRNGs and hash
functions: the former are designed for a good distribution &lt;em&gt;within&lt;/em&gt; a single stateful stream, but do
not necessarily provide good distribution &lt;em&gt;across&lt;/em&gt; streams with consecutive seeds; hash functions are
stateless and designed to give a good distribution even with consecutive (or otherwise highly
correlated) inputs.&lt;/p&gt;
&lt;p&gt;PCG is actually designed to be a PRNG, &lt;em&gt;not&lt;/em&gt; a hash function, so it may surprise you to see it being
used as a hash here. What gives? Well, apparently PCG is just so good that it works well as a hash
function too! ¯\_(ツ)_/¯&lt;/p&gt;
&lt;p&gt;It’s worth noting that PCG &lt;em&gt;does&lt;/em&gt; support more or less efficient jump-ahead, owing to the LCG at its
core; it’s possible to advance an LCG by $n$ steps in only $O(\log n)$ work using
&lt;a href=&#34;https://www.nayuki.io/page/fast-skipping-in-a-linear-congruential-generator&#34;&gt;modular exponentiation&lt;/a&gt;.
However, that is not what Jarzynski and Olano’s code does: it’s not jumping ahead to the $n$th
value in a single PCG sequence, but essentially just taking the first value from each of $n$
sequences with consecutive initial states. The fact that this works at all is somewhat surprising,
and a testament to the power of permutation functions.&lt;/p&gt;
&lt;p&gt;In my previous article, I also recommended that if you need multiple random values per pixel, you
could start with a hash function and then iterate either LCG or Xorshift using the hash output as an
initial state. You can still do that, using PCG as the initial hash—but it might be just as fast
to iterate PCG. The interesting thing about PCG’s design is that only the LCG portion of it actually
carries data dependencies from one iteration to the next, and LCGs are super fast. The permutation
parts are independent of each other and can be pipelined to exploit instruction-level parallelism
when doing multiple iterations.&lt;/p&gt;
&lt;p&gt;For completeness, the “PRNG form” of the above PCG variant looks like:&lt;/p&gt;
&lt;div class=&#34;codehilite&#34;&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class=&#34;kt&#34;&gt;uint&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;rng_state&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;

&lt;span class=&#34;kt&#34;&gt;uint&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;rand_pcg&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;()&lt;/span&gt;
&lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;kt&#34;&gt;uint&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;state&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;rng_state&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;rng_state&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;rng_state&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;*&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;747796405&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;u&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;+&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;2891336453&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;u&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;kt&#34;&gt;uint&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;word&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;((&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;state&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;&amp;gt;&amp;gt;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;((&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;state&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;&amp;gt;&amp;gt;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;28&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;u&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;+&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;4&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;u&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;))&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;^&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;state&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;*&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;277803737&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;u&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;return&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;word&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;&amp;gt;&amp;gt;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;22&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;u&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;^&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;word&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;That’s about it! Be sure to check out &lt;a href=&#34;http://jcgt.org/published/0009/03/02/&#34;&gt;Jarzynski and Olano’s paper&lt;/a&gt;
for some more tidbits, including a discussion of hashes with multi-dimensional inputs and outputs.&lt;/p&gt;</description>
			</item>
			<item>
				<title>Making Your Own Container Compatible With C++20 Ranges</title>
				<link>https://www.reedbeta.com/blog/ranges-compatible-containers/</link>
				<guid>https://www.reedbeta.com/blog/ranges-compatible-containers/</guid>
				<dc:creator>Nathan Reed</dc:creator>
<pubDate>Sat, 20 Mar 2021 17:23:15 -0700</pubDate><comments>https://www.reedbeta.com/blog/ranges-compatible-containers/#comments</comments>					<category>Coding</category>
				<description>&lt;p&gt;With some of my spare time lately, I’ve been enjoying learning about some of the new features in
C++20. &lt;a href=&#34;https://en.cppreference.com/w/cpp/language/constraints&#34;&gt;Concepts&lt;/a&gt; and the closely-related
&lt;a href=&#34;https://akrzemi1.wordpress.com/2020/03/26/requires-clause/&#34;&gt;&lt;code&gt;requires&lt;/code&gt; clauses&lt;/a&gt; are two great
extensions to template syntax that remove the necessity for all the SFINAE junk we used to have to
do, making our code both more readable and more precise, and providing much better error messages
(although MSVC has sadly been &lt;a href=&#34;https://developercommunity.visualstudio.com/t/786814&#34;&gt;lagging in the error messages department&lt;/a&gt;,
at the time of this writing).&lt;/p&gt;
&lt;p&gt;Another interesting C++20 feature is the addition of the &lt;a href=&#34;https://en.cppreference.com/w/cpp/ranges&#34;&gt;ranges library&lt;/a&gt;
(also &lt;a href=&#34;https://en.cppreference.com/w/cpp/algorithm/ranges&#34;&gt;ranges algorithms&lt;/a&gt;), which provides a
nicer, more composable abstraction for operating on containers and sequences of objects. At the most
basic level, a range wraps an iterator begin/end pair, but there’s much more to it than that. This
article isn’t going to be a tutorial on ranges, but &lt;a href=&#34;https://www.youtube.com/watch?v=VmWS-9idT3s&#34;&gt;here’s a talk&lt;/a&gt;
to watch if you want to see more of what it’s all about.&lt;/p&gt;
&lt;p&gt;What I’m going to discuss today is the process of adding “ranges compatibility” to your own container
class. Many of the C++ codebases we work in have their own set of container classes beyond the STL
ones, for a variety of reasons—&lt;a href=&#34;/blog/data-oriented-hash-table/&#34;&gt;better performance&lt;/a&gt;, more control
over memory layouts, more customized interfaces, and so on. With a little work, it’s possible to
make your custom containers also function as ranges and interoperate with the C++20 ranges library.
Here’s how to do it.&lt;/p&gt;
&lt;!--more--&gt;

&lt;div class=&#34;toc&#34;&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/ranges-compatible-containers/#making-your-container-an-input-range&#34;&gt;Making Your Container an Input Range&lt;/a&gt;&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/ranges-compatible-containers/#range-concepts&#34;&gt;Range Concepts&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/ranges-compatible-containers/#defining-range-compatible-iterators&#34;&gt;Defining Range-Compatible Iterators&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/ranges-compatible-containers/#begin-end-size&#34;&gt;Begin, End, Size&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/ranges-compatible-containers/#accepting-output-from-ranges&#34;&gt;Accepting Output From Ranges&lt;/a&gt;&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/ranges-compatible-containers/#constructor-from-a-range&#34;&gt;Constructor From A Range&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/ranges-compatible-containers/#output-iterators&#34;&gt;Output Iterators&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/ranges-compatible-containers/#conclusion&#34;&gt;Conclusion&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;h2 id=&#34;making-your-container-an-input-range&#34;&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/ranges-compatible-containers/#making-your-container-an-input-range&#34; title=&#34;Permalink to this section&#34;&gt;Making Your Container an Input Range&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;At the high level, there are two basic ways that a container class can interact with ranges. First,
it can be &lt;em&gt;readable&lt;/em&gt; as a range, meaning that we can iterate over it, pipe it into views and pass it
to range algorithms, and so forth. In the parlance of the ranges library, this is known as being an
&lt;em&gt;input range&lt;/em&gt;: a range that can provide input to other things.&lt;/p&gt;
&lt;p&gt;The other direction is to accept output &lt;em&gt;from&lt;/em&gt; ranges, storing the output into your container.
We’ll do that later. To begin with, let’s see how to make your container act as an input range.&lt;/p&gt;
&lt;h3 id=&#34;range-concepts&#34;&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/ranges-compatible-containers/#range-concepts&#34; title=&#34;Permalink to this section&#34;&gt;Range Concepts&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The first decision we have to make is what particular kind of input range we can model. The C++20
STL defines a number of different &lt;a href=&#34;https://en.cppreference.com/w/cpp/ranges#Range_concepts&#34;&gt;concepts for ranges&lt;/a&gt;,
depending on the capabilities of their iterators and other things. Several of these form a hierarchy
from more general to more specific kinds of ranges with tighter requirements. Generally speaking, it’s
best for your container to implement the most specific range concept it’s able to. This enables code
that works with ranges to make better decisions and use more optimal code paths. (We’ll see some
examples of this in a minute.)&lt;/p&gt;
&lt;p&gt;The relevant input range concepts are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;std::ranges::input_range&lt;/code&gt;: the most bare-bones version. It requires only that you have iterators
  that can retrieve the contents of the range. In particular, it &lt;em&gt;doesn’t&lt;/em&gt; require that the range
  can be iterated more than once: iterators are not required to be copyable, and &lt;code&gt;begin&lt;/code&gt;/&lt;code&gt;end&lt;/code&gt; are
  not required to give you the iterators more than once. This could be an appropriate concept for
  ranges that are actually generating their contents as the result of some algorithm that’s not
  easily/cheaply repeatable, or receiving data from a network connection or suchlike.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;std::ranges::forward_range&lt;/code&gt;: the range can be iterated as many times as you like, but only in
  the forward direction. Iterators can be copied and saved off to later resume iteration from an
  earlier point, for example.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;std::ranges::bidirectional_range&lt;/code&gt;: iterators can be decremented as well as incremented.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;std::ranges::random_access_range&lt;/code&gt;: you can efficiently do arithmetic on iterators—you can
  offset them forward or backward by a given number of steps, or subtract them to find the number
  of steps between.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;std::ranges::contiguous_range&lt;/code&gt;: the elements are actually stored as a contiguous array in memory;
  the iterators are essentially fancy pointers (or literally &lt;em&gt;are&lt;/em&gt; just pointers).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In addition to this hierarchy of input range concepts, there are a couple of other standalone ones
worth mentioning:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;std::ranges::sized_range&lt;/code&gt;: you can efficiently get the size of the range, i.e. how many elements
  from begin to end. Note that this is a much looser constraint than &lt;code&gt;random_access_range&lt;/code&gt;: the
  latter requires you be able to efficiently measure the distance between &lt;em&gt;any pair&lt;/em&gt; of iterators
  inside the range, while &lt;code&gt;sized_range&lt;/code&gt; only requires that the size of the &lt;em&gt;whole range&lt;/em&gt; is known.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;std::ranges::borrowed_range&lt;/code&gt;: indicates that a range doesn’t own its data, i.e. it’s referencing
  (“borrowing”) data that lives somewhere else. This can be useful because it allows references/iterators
  into the data to survive beyond the lifetime of the range object itself.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The reason all these concepts are important is that if I’m writing code that operates on ranges, I might need to
require some of these concepts in order to do my work efficiently. For example, a sorting routine
would be very difficult to write for anything less than a &lt;code&gt;random_access_range&lt;/code&gt; (and indeed you’ll
see that &lt;a href=&#34;https://en.cppreference.com/w/cpp/algorithm/ranges/sort&#34;&gt;&lt;code&gt;std::ranges::sort&lt;/code&gt; requires that&lt;/a&gt;).
In other cases, I might be able to do things more optimally when the range satisfies certain
concepts—for instance, if it’s a &lt;code&gt;sized_range&lt;/code&gt;, I could preallocate some storage for results,
while if it’s only an &lt;code&gt;input_range&lt;/code&gt; and no more, then I’ll have to dynamically reallocate, as I have
no idea how many elements there are going to be.&lt;/p&gt;
&lt;p&gt;The rest of the ranges library is written in terms of these concepts (and you can write your own
code that operates generically on ranges using these concepts as well). So, once your container
satisfies the relevant concepts, it will automatically be recognized and function as a range!&lt;/p&gt;
&lt;p&gt;In C++20, concepts act as boolean expressions, so you can check whether your container satisfies the
concepts you expect by just writing asserts for them:&lt;/p&gt;
&lt;div class=&#34;codehilite&#34;&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class=&#34;cp&#34;&gt;#include&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;cpf&#34;&gt;&amp;lt;ranges&amp;gt;&lt;/span&gt;
&lt;span class=&#34;k&#34;&gt;static_assert&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;std&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;::&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ranges&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;::&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;forward_range&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;MyCoolContainer&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&#34;kt&#34;&gt;int&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;&amp;gt;&amp;gt;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;);&lt;/span&gt;
&lt;span class=&#34;c1&#34;&gt;// int is just an arbitrarily chosen element type, since we&lt;/span&gt;
&lt;span class=&#34;c1&#34;&gt;// can&amp;#39;t assert a concept for an uninstantiated template&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Checks like this are great to add to your test suite—I’m big in favor of writing &lt;em&gt;compile-time&lt;/em&gt;
tests for generic/metaprogramming stuff, in addition to the usual runtime tests.&lt;/p&gt;
&lt;p&gt;However, when you first drop that assert into your code, it will almost certainly fail. Let’s see
now what you need to do to actually satisfy the range concepts.&lt;/p&gt;
&lt;h3 id=&#34;defining-range-compatible-iterators&#34;&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/ranges-compatible-containers/#defining-range-compatible-iterators&#34; title=&#34;Permalink to this section&#34;&gt;Defining Range-Compatible Iterators&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;In order to satisfy the input range concepts, you need to do two things:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Have &lt;code&gt;begin&lt;/code&gt; and &lt;code&gt;end&lt;/code&gt; functions that return some iterator and sentinel types. (We’ll discuss
  these in a little bit.)&lt;/li&gt;
&lt;li&gt;The iterator type must satisfy the iterator concept that matches your range concept.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Each one of the concepts from &lt;code&gt;input_range&lt;/code&gt; down to &lt;code&gt;contiguous_range&lt;/code&gt; has a corresponding
&lt;a href=&#34;https://en.cppreference.com/w/cpp/header/iterator#Iterator_concepts&#34;&gt;iterator concept&lt;/a&gt;:
&lt;code&gt;std::input_iterator&lt;/code&gt;, &lt;code&gt;std::forward_iterator&lt;/code&gt;, and so on. It’s these concepts that contain the real
meat of the requirements that define the different types of ranges: they list all the operations
each kind of iterator must support.&lt;/p&gt;
&lt;p&gt;To begin with, there are a couple of member type aliases that any iterator class will need to define:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;difference_type&lt;/code&gt;: some signed integer type, usually &lt;code&gt;std::ptrdiff_t&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;value_type&lt;/code&gt;: the type of elements that the iterator references&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The second one seems pretty understandable, but I honestly have no idea why the &lt;code&gt;difference_type&lt;/code&gt;
requirement is here. Taking the difference between iterators doesn’t make sense until you get to
random-access iterators, which actually define that operation. As far as I can tell, the
&lt;code&gt;difference_type&lt;/code&gt; for more general iterators isn’t actually &lt;em&gt;used&lt;/em&gt; by anything. Nevertheless,
according to the C++ standard, it has to be there. It seems that the usual idiom is to set it to
&lt;code&gt;std::ptrdiff_t&lt;/code&gt; in such cases, although it can be any signed integer type.&lt;/p&gt;
&lt;p&gt;(Technically you can also define these types by specializing &lt;code&gt;std::iterator_traits&lt;/code&gt; for your iterator,
but here we’re just going to put them in the class.)&lt;/p&gt;
&lt;p&gt;Beyond that, the requirements for &lt;code&gt;std::input_iterator&lt;/code&gt; are pretty straightforward:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The iterator must be default-initializable and movable. (It doesn’t have to be copyable.)&lt;/li&gt;
&lt;li&gt;It must be equality-comparable with its sentinel (the value marking the end of the range). It
  doesn’t have to be equality-comparable with other iterators.&lt;/li&gt;
&lt;li&gt;It must implement &lt;code&gt;operator ++&lt;/code&gt;, in &lt;em&gt;both&lt;/em&gt; preincrement and postincrement positions. However, the
  postincrement version does not have to return anything.&lt;/li&gt;
&lt;li&gt;It must have an &lt;code&gt;operator *&lt;/code&gt; that returns a reference to whatever the &lt;code&gt;value_type&lt;/code&gt; is.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;One point of interest here is that the default-initializable requirement means that the iterator class
can’t contain references, e.g. a reference to the container it comes from. It can store pointers,
though.&lt;/p&gt;
&lt;p&gt;A prototype input iterator class could look like this:&lt;/p&gt;
&lt;div class=&#34;codehilite&#34;&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class=&#34;k&#34;&gt;template&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&#34;k&#34;&gt;typename&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;nc&#34;&gt;T&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;&amp;gt;&lt;/span&gt;
&lt;span class=&#34;k&#34;&gt;class&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;nc&#34;&gt;Iterator&lt;/span&gt;
&lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;
&lt;span class=&#34;k&#34;&gt;public&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;:&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;using&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;difference_type&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;std&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;::&lt;/span&gt;&lt;span class=&#34;kt&#34;&gt;ptrdiff_t&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;using&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;value_type&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;T&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;Iterator&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;();&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;                 &lt;/span&gt;&lt;span class=&#34;c1&#34;&gt;// default-initializable&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;kt&#34;&gt;bool&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;operator&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;==&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;k&#34;&gt;const&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;Sentinel&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;const&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;   &lt;/span&gt;&lt;span class=&#34;c1&#34;&gt;// equality with sentinel&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;T&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;operator&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;*&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;()&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;const&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;     &lt;/span&gt;&lt;span class=&#34;c1&#34;&gt;// dereferenceable&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;Iterator&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;operator&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;++&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;()&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;c1&#34;&gt;// pre-incrementable&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;        &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;cm&#34;&gt;/*do stuff...*/&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;return&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;*&lt;/span&gt;&lt;span class=&#34;k&#34;&gt;this&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;kt&#34;&gt;void&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;operator&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;++&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;kt&#34;&gt;int&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;      &lt;/span&gt;&lt;span class=&#34;c1&#34;&gt;// post-incrementable&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;        &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;++*&lt;/span&gt;&lt;span class=&#34;k&#34;&gt;this&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
&lt;span class=&#34;k&#34;&gt;private&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;:&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;c1&#34;&gt;// implementation...&lt;/span&gt;
&lt;span class=&#34;p&#34;&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;For a &lt;code&gt;std::forward_iterator&lt;/code&gt;, the requirements are just slightly tighter:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The iterator must be copyable.&lt;/li&gt;
&lt;li&gt;It must be equality-comparable with other iterators of the same container.&lt;/li&gt;
&lt;li&gt;The postincrement operator must return a copy of the iterator before modification.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;A prototype forward iterator class could look like:&lt;/p&gt;
&lt;div class=&#34;codehilite&#34;&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class=&#34;k&#34;&gt;template&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&#34;k&#34;&gt;typename&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;nc&#34;&gt;T&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;&amp;gt;&lt;/span&gt;
&lt;span class=&#34;k&#34;&gt;class&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;nc&#34;&gt;Iterator&lt;/span&gt;
&lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;
&lt;span class=&#34;k&#34;&gt;public&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;:&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;c1&#34;&gt;// ...same as the previous one, except:&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;kt&#34;&gt;bool&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;operator&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;==&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;k&#34;&gt;const&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;Iterator&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;const&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;   &lt;/span&gt;&lt;span class=&#34;c1&#34;&gt;// equality with iterators&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;Iterator&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;operator&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;++&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;kt&#34;&gt;int&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;  &lt;/span&gt;&lt;span class=&#34;c1&#34;&gt;// post-incrementable, returns prev value&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;        &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;Iterator&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;temp&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;*&lt;/span&gt;&lt;span class=&#34;k&#34;&gt;this&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;++*&lt;/span&gt;&lt;span class=&#34;k&#34;&gt;this&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;return&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;temp&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
&lt;span class=&#34;p&#34;&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;I’m not going to go through the rest of them in detail; you can read the details
&lt;a href=&#34;https://en.cppreference.com/w/cpp/header/iterator#Iterator_concepts&#34;&gt;on cppreference&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id=&#34;begin-end-size&#34;&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/ranges-compatible-containers/#begin-end-size&#34; title=&#34;Permalink to this section&#34;&gt;Begin, End, Size&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Once your container is equipped with an iterator class that satisfies the relevant concepts, you’ll
need to provide &lt;code&gt;begin&lt;/code&gt; and &lt;code&gt;end&lt;/code&gt; functions to get those iterators. There are three ways to do this:
they can be member functions on the container, they can be free functions that live next to the
container in the same namespace, or they can be &lt;a href=&#34;https://www.justsoftwaresolutions.co.uk/cplusplus/hidden-friends.html&#34;&gt;“hidden friends”&lt;/a&gt;;
they just need to be findable by &lt;a href=&#34;https://en.cppreference.com/w/cpp/language/adl&#34;&gt;ADL&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The return types from &lt;code&gt;begin&lt;/code&gt; and &lt;code&gt;end&lt;/code&gt; don’t have to be the same. In some cases, it can be useful
to have &lt;code&gt;end&lt;/code&gt; return a different type of object, a “sentinel”, which isn’t actually an iterator; it
just needs to be equality-comparable with iterators, so you can tell when you’ve gotten to the end
of the container.&lt;/p&gt;
&lt;p&gt;Also, these are the same &lt;code&gt;begin&lt;/code&gt;/&lt;code&gt;end&lt;/code&gt; used for &lt;a href=&#34;https://en.cppreference.com/w/cpp/language/range-for&#34;&gt;range-based &lt;code&gt;for&lt;/code&gt; loops&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;One oddity worth mentioning here is that if you go the free/friend functions route, you’ll need to
add overloads for both const and non-const versions of your container:&lt;/p&gt;
&lt;div class=&#34;codehilite&#34;&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class=&#34;k&#34;&gt;class&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;nc&#34;&gt;MyCoolContainer&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;span class=&#34;k&#34;&gt;auto&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;begin&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;k&#34;&gt;const&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;MyCoolContainer&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;c&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;);&lt;/span&gt;
&lt;span class=&#34;k&#34;&gt;auto&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;end&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;k&#34;&gt;const&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;MyCoolContainer&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;c&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;);&lt;/span&gt;
&lt;span class=&#34;k&#34;&gt;auto&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;begin&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;MyCoolContainer&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;c&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;);&lt;/span&gt;
&lt;span class=&#34;k&#34;&gt;auto&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;end&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;MyCoolContainer&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;c&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;You might think it would be enough to provide just the const overloads, but if you do that, only the
const version of the container will be recognized as a range! The non-const overloads must be
present as well for non-const containers to work.&lt;/p&gt;
&lt;p&gt;Curiously, if you provide &lt;code&gt;begin&lt;/code&gt;/&lt;code&gt;end&lt;/code&gt; as member functions instead, then this doesn’t come up:
const overloads will work for both.&lt;/p&gt;
&lt;p&gt;This behavior is surprising, and I’m not sure if it was intended. However, it’s worth noting that
iterators generally need to remember the constness of the container they came from: a const
container should give you a “const iterator” that doesn’t allow mutating its elements. Therefore,
the const and non-const overloads of &lt;code&gt;begin&lt;/code&gt;/&lt;code&gt;end&lt;/code&gt; will generally need to return &lt;em&gt;different&lt;/em&gt;
iterator types, and so you’ll need to have both in any case. (The exception would be if you’re
building an immutable container; then it only needs a const iterator type.)&lt;/p&gt;
&lt;p&gt;In addition to &lt;code&gt;begin&lt;/code&gt; and &lt;code&gt;end&lt;/code&gt;, you’ll also want to implement a &lt;code&gt;size&lt;/code&gt; function, if applicable.
Again, this can be either a member function, a free function, or a hidden friend. The
presence of this function satisfies &lt;code&gt;std::ranges::sized_range&lt;/code&gt;, which (as mentioned earlier) can
enable range algorithms to operate more efficiently.&lt;/p&gt;
&lt;p&gt;So, to sum up: to allow your custom container class to be readable as a range, you’ll need to:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Decide which range concept(s) you can model, which mainly comes down to what level of iterator
    capabilities you can provide;&lt;/li&gt;
&lt;li&gt;Implement iterator classes (both const and non-const, if applicable) that fulfill all the
    requirements of the chosen iterator concept;&lt;/li&gt;
&lt;li&gt;Implement &lt;code&gt;begin&lt;/code&gt;, &lt;code&gt;end&lt;/code&gt;, and &lt;code&gt;size&lt;/code&gt; functions.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Once we’ve done this, the ranges library should recognize your container as a range. It will
automatically be accepted by range algorithms, we can take views of it, we can iterate over it in
range-for loops, and so on.&lt;/p&gt;
&lt;p&gt;As before, you can test that you’ve done everything correctly by asserting that your container
satisfies the expected range concepts. If you’re working with gcc or clang, this will even give you
some pretty reasonable error messages if you didn’t get it right! (In MSVC, for the time being, you’ll
have to narrow down errors by popping open the hood and asserting each of the concept’s sub-clauses
one at a time, to see which one(s) failed.)&lt;/p&gt;
&lt;h2 id=&#34;accepting-output-from-ranges&#34;&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/ranges-compatible-containers/#accepting-output-from-ranges&#34; title=&#34;Permalink to this section&#34;&gt;Accepting Output From Ranges&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;We’ve discussed how to make a custom container serve as input &lt;em&gt;to&lt;/em&gt; the C++20 ranges library. Now, we
need to come back to the other direction: how to let your container capture output &lt;em&gt;from&lt;/em&gt; the
ranges library.&lt;/p&gt;
&lt;p&gt;There are a couple of different forms this can take. One way is to accept generic ranges as
parameters to a constructor (or other methods, such as append or insert methods) of your container
class. This allows, for example, easily converting other containers (that are also range-compatible)
to your container. It also allows capturing the output of a ranges “pipeline” (a series of views
chained together).&lt;/p&gt;
&lt;p&gt;Another form of range output, which comes up with certain of the &lt;a href=&#34;https://en.cppreference.com/w/cpp/algorithm/ranges&#34;&gt;range algorithms&lt;/a&gt;,
is via &lt;em&gt;output iterators&lt;/em&gt;, which are iterators that allow storing or inserting values into your
container.&lt;/p&gt;
&lt;h3 id=&#34;constructor-from-a-range&#34;&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/ranges-compatible-containers/#constructor-from-a-range&#34; title=&#34;Permalink to this section&#34;&gt;Constructor From A Range&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;To write a constructor (or other method) that takes a generic range parameter, we can use the same
range concepts we saw earlier. One neat new feature in C++20 is writing functions with a parameter
type (or return type) constrained to match a given concept. The syntax looks like this:&lt;/p&gt;
&lt;div class=&#34;codehilite&#34;&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class=&#34;cp&#34;&gt;#include&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;cpf&#34;&gt;&amp;lt;ranges&amp;gt;&lt;/span&gt;
&lt;span class=&#34;k&#34;&gt;class&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;nc&#34;&gt;MyCoolContainer&lt;/span&gt;
&lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;
&lt;span class=&#34;k&#34;&gt;public&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;:&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;explicit&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;MyCoolContainer&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;std&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;::&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ranges&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;::&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;input_range&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;auto&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;&amp;amp;&amp;amp;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;range&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;        &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;for&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;k&#34;&gt;auto&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;&amp;amp;&amp;amp;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;item&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;:&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;range&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;        &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;            &lt;/span&gt;&lt;span class=&#34;c1&#34;&gt;// process the item&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;        &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
&lt;span class=&#34;p&#34;&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The syntax &lt;code&gt;concept-name auto&lt;/code&gt; for the parameter type reminds us that concepts aren’t types; this
is still, under the hood, a template function that’s performing argument type deduction (hence the
&lt;code&gt;auto&lt;/code&gt;). In other words, the above is syntactic sugar for:&lt;/p&gt;
&lt;div class=&#34;codehilite&#34;&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class=&#34;k&#34;&gt;template&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;std&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;::&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ranges&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;::&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;input_range&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;R&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;&amp;gt;&lt;/span&gt;
&lt;span class=&#34;k&#34;&gt;explicit&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;MyCoolContainer&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;R&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;&amp;amp;&amp;amp;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;range&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;c1&#34;&gt;// ...&lt;/span&gt;
&lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;which is in turn sugar for:&lt;/p&gt;
&lt;div class=&#34;codehilite&#34;&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class=&#34;k&#34;&gt;template&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&#34;k&#34;&gt;typename&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;nc&#34;&gt;R&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;&amp;gt;&lt;/span&gt;
&lt;span class=&#34;k&#34;&gt;requires&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;std&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;::&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ranges&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;::&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;input_range&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;R&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;span class=&#34;k&#34;&gt;explicit&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;MyCoolContainer&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;R&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;&amp;amp;&amp;amp;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;range&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;c1&#34;&gt;// ...&lt;/span&gt;
&lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;I prefer the shorthand &lt;code&gt;std::ranges::input_range auto&lt;/code&gt; syntax, but &lt;del&gt;at the time of this writing
MSVC’s support for it is still shaky&lt;/del&gt;. (&lt;em&gt;Update: fixed in 16.10!&lt;/em&gt; 😊) If in doubt, use
the syntax &lt;code&gt;template &amp;lt;std::ranges::input_range R&amp;gt;&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;In any case, constraining the parameter type to satisfy &lt;code&gt;input_range&lt;/code&gt; allows this constructor
overload to accept anything out there that implements &lt;code&gt;begin&lt;/code&gt;, &lt;code&gt;end&lt;/code&gt;, and iterators, as we’ve seen
in previous sections. You can then iterate over it generically and do whatever you want with the
results.&lt;/p&gt;
&lt;p&gt;The range parameter is declared as &lt;code&gt;auto&amp;amp;&amp;amp;&lt;/code&gt; to make it a &lt;a href=&#34;https://isocpp.org/blog/2012/11/universal-references-in-c11-scott-meyers&#34;&gt;universal reference&lt;/a&gt;,
meaning that it can accept either lvalues or rvalues; in particular, it can accept the result of a
function call returning a range, and it can accept the result of a pipeline:&lt;/p&gt;
&lt;div class=&#34;codehilite&#34;&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class=&#34;n&#34;&gt;MyCoolContainer&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;c&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;another_range&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;|&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;                   &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;std&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;::&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;views&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;::&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;transform&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;blah&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;|&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;                   &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;std&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;::&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;views&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;::&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;filter&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;blah&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;A completely generic range-accepting method like this might not be the most useful thing. If we have
a container storing &lt;code&gt;int&lt;/code&gt; values, for example, it wouldn’t make a lot of sense for us to accept
ranges of strings or other arbitrary types. We’d like to be able to put some additional constraints
on the &lt;em&gt;element type&lt;/em&gt; of the range: perhaps we only want element types that are convertible to &lt;code&gt;int&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Helpfully, the ranges library provides a template &lt;a href=&#34;https://en.cppreference.com/w/cpp/ranges/iterator_t&#34;&gt;&lt;code&gt;range_value_t&lt;/code&gt;&lt;/a&gt;
that retrieves the element type of a range—namely, the &lt;code&gt;value_type&lt;/code&gt; declared by the range’s
iterator. With this, we can state additional constraints like so:&lt;/p&gt;
&lt;div class=&#34;codehilite&#34;&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class=&#34;k&#34;&gt;explicit&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;MyCoolContainer&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;std&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;::&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ranges&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;::&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;input_range&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;auto&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;&amp;amp;&amp;amp;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;range&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;span class=&#34;k&#34;&gt;requires&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;std&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;::&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;convertible_to&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;std&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;::&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ranges&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;::&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;range_value_t&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&#34;k&#34;&gt;decltype&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;range&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;kt&#34;&gt;int&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;c1&#34;&gt;// ...&lt;/span&gt;
&lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;We can even define a concept that wraps up these requirements:&lt;/p&gt;
&lt;div class=&#34;codehilite&#34;&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class=&#34;k&#34;&gt;template&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&#34;k&#34;&gt;typename&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;nc&#34;&gt;R&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;typename&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;nc&#34;&gt;T&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;&amp;gt;&lt;/span&gt;
&lt;span class=&#34;k&#34;&gt;concept&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;nc&#34;&gt;input_range_of&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;std&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;::&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ranges&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;::&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;input_range&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;R&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;&amp;amp;&amp;amp;&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;std&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;::&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;convertible_to&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;std&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;::&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ranges&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;::&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;range_value_t&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;R&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;T&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;and then use it as follows:&lt;/p&gt;
&lt;div class=&#34;codehilite&#34;&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class=&#34;k&#34;&gt;explicit&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;MyCoolContainer&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;input_range_of&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&#34;kt&#34;&gt;int&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;auto&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;&amp;amp;&amp;amp;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;range&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;c1&#34;&gt;// ...&lt;/span&gt;
&lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Something like this should be in the standard library, IMO.&lt;/p&gt;
&lt;p&gt;You can also choose to require one of the more specialized concepts, like &lt;code&gt;forward_range&lt;/code&gt; or
&lt;code&gt;random_access_range&lt;/code&gt;, if you need those extra capabilities for whatever you’re doing.
However, just as a container should generally implement the most &lt;em&gt;specific&lt;/em&gt; range concept it can
provide, a function that takes a range parameter should generally require the most &lt;em&gt;general&lt;/em&gt; range
concept it can deal with, or it will unduly restrict what kind of ranges can be passed to it.&lt;/p&gt;
&lt;p&gt;That said, there might be cases where you can switch to a more efficient implementation if the range
satisfies some extra requirements. For example, if it’s a &lt;code&gt;sized_range&lt;/code&gt;, then you might be able to
reserve storage before inserting the elements. You can test for this inside your function body using
&lt;code&gt;if constexpr&lt;/code&gt;:&lt;/p&gt;
&lt;div class=&#34;codehilite&#34;&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class=&#34;k&#34;&gt;explicit&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;MyCoolContainer&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;input_range_of&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&#34;kt&#34;&gt;int&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;auto&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;&amp;amp;&amp;amp;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;range&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;if&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;constexpr&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;std&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;::&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ranges&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;::&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;sized_range&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&#34;k&#34;&gt;decltype&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;range&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;        &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;reserve&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;std&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;::&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ranges&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;::&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;size&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;range&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;));&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;

&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;for&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;k&#34;&gt;auto&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;&amp;amp;&amp;amp;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;item&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;:&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;range&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;        &lt;/span&gt;&lt;span class=&#34;c1&#34;&gt;// process the item&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
&lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Here, &lt;a href=&#34;https://en.cppreference.com/w/cpp/ranges/size&#34;&gt;&lt;code&gt;std::ranges::size&lt;/code&gt;&lt;/a&gt; is a convenience wrapper
that knows how to call the range’s associated &lt;code&gt;size&lt;/code&gt; function, whether it’s implemented as a method
or a free function.&lt;/p&gt;
&lt;p&gt;You could also do things like: check if the range is a &lt;code&gt;contiguous_range&lt;/code&gt; and the item is something
trivially copyable, and switch to &lt;code&gt;memcpy&lt;/code&gt; rather than iterating over all the items.&lt;/p&gt;
&lt;h3 id=&#34;output-iterators&#34;&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/ranges-compatible-containers/#output-iterators&#34; title=&#34;Permalink to this section&#34;&gt;Output Iterators&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Range views and pipelines operate on a “pull” model, where the pipeline is represented by a proxy
range object that generates its results lazily when you iterate it. Taking generic range objects as
parameters to your container is an easy and useful way to consume such objects, and that probably suffices
for most uses. However, there are a handful of bits in the ranges library that operate on a “push”
model, where you call a function that wants to store values into your container via an output
iterator. This comes up with &lt;a href=&#34;https://en.cppreference.com/w/cpp/algorithm/ranges#Modifying_sequence_operations&#34;&gt;certain ranges algorithms&lt;/a&gt;
like &lt;code&gt;ranges::copy&lt;/code&gt;, &lt;code&gt;ranges::transform&lt;/code&gt;, and &lt;code&gt;ranges::generate&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Personally, I don’t see a hugely compelling reason to worry about these, as it’s also possible to
use views to express the same operations; but for the sake of completeness, I’ll discuss them
briefly here.&lt;/p&gt;
&lt;p&gt;At this point, it won’t surprise you to learn that just as there were concepts for input ranges,
there are also concepts &lt;code&gt;std::ranges::output_range&lt;/code&gt; and &lt;a href=&#34;https://en.cppreference.com/w/cpp/iterator/output_iterator&#34;&gt;&lt;code&gt;std::output_iterator&lt;/code&gt;&lt;/a&gt;.
In this case there’s just that one concept, not a hierarchy of refinements of them; however, if you
peruse the definitions of some of the ranges algorithms, you’ll find that many of them don’t actually
use &lt;code&gt;output_iterator&lt;/code&gt;, but state slightly different, less- or more-specific requirements of their
own. (This part of the standard library feels a little less fully baked than the rest; I wouldn’t be
surprised if some of this gets elaborated or polished a bit more in C++23 or later revisions.)&lt;/p&gt;
&lt;p&gt;The requirements for an output iterator (broadly construed) are very similar to those for an input
iterator, only adding that the value returned by dereferencing the iterator must be writable by
assigning to it: you must be able to do &lt;code&gt;*iter = foo;&lt;/code&gt; for some appropriate type of &lt;code&gt;foo&lt;/code&gt;. If you’ve
implemented a non-const input iterator, it probably satisfies the requirement already.&lt;/p&gt;
&lt;p&gt;It’s also possible to do slightly more exotic things with an output iterator, like returning a proxy
object that accepts assignment and does “something” with the value assigned. An example of this is
the STL’s &lt;a href=&#34;https://en.cppreference.com/w/cpp/iterator/back_insert_iterator&#34;&gt;&lt;code&gt;std::back_insert_iterator&lt;/code&gt;&lt;/a&gt;,
which takes whatever is assigned to it and &lt;em&gt;appends&lt;/em&gt; to its container (as opposed to overwriting an
existing value in the container). The STL has a few more things like that, including an iterator
that writes characters out to an &lt;code&gt;ostream&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;There are also some cases amongst the ranges algorithms of “input-output” iterators, such as for
operations that reorder a range in place, like sorting. These often have a bidirectional or
random-access iterator requirement, plus needing the dereferenced types to be swappable, movable,
and varying other constraints. Those details probably aren’t going to be relevant to you unless
you’re doing something tricky, like making a container that generates elements on the fly somehow,
or returns proxy objects rather than direct references to elements (like &lt;code&gt;std::vector&amp;lt;bool&amp;gt;&lt;/code&gt;).&lt;/p&gt;
&lt;h2 id=&#34;conclusion&#34;&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/ranges-compatible-containers/#conclusion&#34; title=&#34;Permalink to this section&#34;&gt;Conclusion&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The C++20 ranges library provides a lot of powerful, composable tools for manipulating sequences of
objects, and a range of specificity from the most generic and abstract container-shaped things down
to the very concrete, efficient, and practical. When working with your own container types, it
would be nice to be able to take advantage of these tools.&lt;/p&gt;
&lt;p&gt;As we’ve seen, it’s hardly an onerous task to implement ranges compatibility for your own containers.
Most of the necessaries are things you were probably already doing: you probably already had an
iterator class and begin/end methods. It only takes a little bit of attention to satisfying certain
details—like adding the &lt;code&gt;difference_type&lt;/code&gt; and &lt;code&gt;value_type&lt;/code&gt; aliases, and making sure you can both
preincrement and postincrement—to make your iterators satisfy the STL iterator concepts, and thus
have your containers recognized as ranges. It’s also no sweat to write functions accepting generic
ranges as input, letting you store the output of other range operations into your container.&lt;/p&gt;
&lt;p&gt;I hope this has been a useful peek under the hood and has given you some ideas about how your
container classes can benefit from the new C++20 features.&lt;/p&gt;</description>
			</item>
			<item>
				<title>Python-Like enumerate() In C++17</title>
				<link>https://www.reedbeta.com/blog/python-like-enumerate-in-cpp17/</link>
				<guid>http://reedbeta.com/blog/python-like-enumerate-in-cpp17/</guid>
				<dc:creator>Nathan Reed</dc:creator>
<pubDate>Sat, 24 Nov 2018 22:42:04 -0800</pubDate><comments>https://www.reedbeta.com/blog/python-like-enumerate-in-cpp17/#comments</comments>					<category>Coding</category>
				<description>&lt;p&gt;Python has a handy built-in function called &lt;a href=&#34;https://docs.python.org/3/library/functions.html?highlight=enumerate#enumerate&#34;&gt;&lt;code&gt;enumerate()&lt;/code&gt;&lt;/a&gt;,
which lets you iterate over an object (e.g. a list) and have access to both the &lt;em&gt;index&lt;/em&gt; and the
&lt;em&gt;item&lt;/em&gt; in each iteration. You use it in a &lt;code&gt;for&lt;/code&gt; loop, like this:&lt;/p&gt;
&lt;div class=&#34;codehilite&#34;&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class=&#34;k&#34;&gt;for&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;i&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;thing&lt;/span&gt; &lt;span class=&#34;ow&#34;&gt;in&lt;/span&gt; &lt;span class=&#34;nb&#34;&gt;enumerate&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;listOfThings&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;):&lt;/span&gt;
    &lt;span class=&#34;nb&#34;&gt;print&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;quot;The &lt;/span&gt;&lt;span class=&#34;si&#34;&gt;%d&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;th thing is &lt;/span&gt;&lt;span class=&#34;si&#34;&gt;%s&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;quot;&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;%&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;i&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;thing&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Iterating over &lt;code&gt;listOfThings&lt;/code&gt; directly would give you &lt;code&gt;thing&lt;/code&gt;, but not &lt;code&gt;i&lt;/code&gt;, and there are plenty of
situations where you’d want both (looking up the index in another data structure, progress reports,
error messages, generating output filenames, etc).&lt;/p&gt;
&lt;p&gt;C++ &lt;a href=&#34;https://en.cppreference.com/w/cpp/language/range-for&#34;&gt;range-based &lt;code&gt;for&lt;/code&gt; loops&lt;/a&gt; work a lot like
Python’s &lt;code&gt;for&lt;/code&gt; loops. Can we implement an analogue of Python’s &lt;code&gt;enumerate()&lt;/code&gt; in C++? We can!&lt;/p&gt;
&lt;!--more--&gt;

&lt;p&gt;C++17 added &lt;a href=&#34;https://en.cppreference.com/w/cpp/language/structured_binding&#34;&gt;structured bindings&lt;/a&gt;
(also known as “destructuring” in other languages), which allow you to pull apart a tuple type and
assign the pieces to different variables, in a single statement. It turns out that this is also
allowed in range &lt;code&gt;for&lt;/code&gt; loops. If the iterator returns a tuple, you can pull it apart and assign the
pieces to different loop variables.&lt;/p&gt;
&lt;p&gt;The syntax for this looks like:&lt;/p&gt;
&lt;div class=&#34;codehilite&#34;&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class=&#34;n&#34;&gt;std&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;::&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;vector&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;std&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;::&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;tuple&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ThingA&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ThingB&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;&amp;gt;&amp;gt;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;things&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;span class=&#34;p&#34;&gt;...&lt;/span&gt;
&lt;span class=&#34;k&#34;&gt;for&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;k&#34;&gt;auto&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;[&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;a&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;b&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;]&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;:&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;things&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;c1&#34;&gt;// a gets the ThingA and b gets the ThingB from each tuple&lt;/span&gt;
&lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;So, we can implement &lt;code&gt;enumerate()&lt;/code&gt; by creating an iterable object that wraps another iterable and
generates the indices during iteration. Then we can use it like this:&lt;/p&gt;
&lt;div class=&#34;codehilite&#34;&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class=&#34;n&#34;&gt;std&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;::&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;vector&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;Thing&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;things&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;span class=&#34;p&#34;&gt;...&lt;/span&gt;
&lt;span class=&#34;k&#34;&gt;for&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;k&#34;&gt;auto&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;[&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;i&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;thing&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;]&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;:&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;enumerate&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;things&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;))&lt;/span&gt;
&lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;c1&#34;&gt;// i gets the index and thing gets the Thing in each iteration&lt;/span&gt;
&lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The implementation of &lt;code&gt;enumerate()&lt;/code&gt; is pretty short, and I present it here for your use:&lt;/p&gt;
&lt;div class=&#34;codehilite&#34;&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class=&#34;cp&#34;&gt;#include&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;cpf&#34;&gt;&amp;lt;tuple&amp;gt;&lt;/span&gt;

&lt;span class=&#34;k&#34;&gt;template&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&#34;k&#34;&gt;typename&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;nc&#34;&gt;T&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;          &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;typename&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;nc&#34;&gt;TIter&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;decltype&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;std&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;::&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;begin&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;std&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;::&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;declval&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;T&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;())),&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;          &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;typename&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;decltype&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;std&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;::&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;end&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;std&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;::&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;declval&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;T&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;()))&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;&amp;gt;&lt;/span&gt;
&lt;span class=&#34;k&#34;&gt;constexpr&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;auto&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;enumerate&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;T&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;&amp;amp;&amp;amp;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;iterable&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;struct&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;nc&#34;&gt;iterator&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;        &lt;/span&gt;&lt;span class=&#34;kt&#34;&gt;size_t&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;i&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;        &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;TIter&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;iter&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;        &lt;/span&gt;&lt;span class=&#34;kt&#34;&gt;bool&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;operator&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;!=&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;k&#34;&gt;const&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;iterator&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;other&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;const&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;return&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;iter&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;!=&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;other&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;iter&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;        &lt;/span&gt;&lt;span class=&#34;kt&#34;&gt;void&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;operator&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;++&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;()&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;++&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;i&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;++&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;iter&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;        &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;auto&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;operator&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;*&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;()&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;const&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;return&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;std&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;::&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;tie&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;i&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;*&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;iter&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;);&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;};&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;struct&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;nc&#34;&gt;iterable_wrapper&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;        &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;T&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;iterable&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;        &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;auto&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;begin&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;()&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;return&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;iterator&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;0&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;std&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;::&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;begin&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;iterable&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;};&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;        &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;auto&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;end&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;()&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;return&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;iterator&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;0&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;std&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;::&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;end&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;iterable&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;};&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;};&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;return&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;iterable_wrapper&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;std&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;::&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;forward&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;T&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;iterable&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;};&lt;/span&gt;
&lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This uses SFINAE to ensure it can only be applied to iterable types, and will generate readable
error messages if used on something else. It accepts its parameter as an rvalue reference so you can
apply it to temporary values (e.g. directly to the return value of a function call) as well as to
variables and members.&lt;/p&gt;
&lt;p&gt;This compiles without warnings in C++17 mode on gcc 8.2, clang 6.0, and MSVC 15.9. I’ve banged on it
a bit to ensure it doesn’t incur any extra copies, and it works as expected with either const or
non-const containers. It seems to optimize away pretty cleanly, too! 🤘&lt;/p&gt;</description>
			</item>
			<item>
				<title>Using A Custom Toolchain In Visual Studio With MSBuild</title>
				<link>https://www.reedbeta.com/blog/custom-toolchain-with-msbuild/</link>
				<guid>http://reedbeta.com/blog/custom-toolchain-with-msbuild/</guid>
				<dc:creator>Nathan Reed</dc:creator>
<pubDate>Tue, 20 Nov 2018 13:34:01 -0800</pubDate><comments>https://www.reedbeta.com/blog/custom-toolchain-with-msbuild/#comments</comments>					<category>Coding</category>
				<description>&lt;p&gt;Like many of you, when I work on a graphics project I sometimes have a need to compile some shaders.
Usually, I’m writing in C++ using Visual Studio, and I’d like to get my shaders built using the
same workflow as the rest of my code. Visual Studio these days has built-in support for HLSL via
&lt;code&gt;fxc&lt;/code&gt;, but what if we want to use the next-gen &lt;a href=&#34;https://github.com/Microsoft/DirectXShaderCompiler&#34;&gt;&lt;code&gt;dxc&lt;/code&gt;&lt;/a&gt;
compiler?&lt;/p&gt;
&lt;p&gt;This post is a how-to for adding support for a custom toolchain—such as &lt;code&gt;dxc&lt;/code&gt;, or any other
command-line-invokable tool—to a Visual Studio project, by scripting MSBuild (the underlying build
system Visual Studio uses). We won’t quite make it to parity with a natively integrated language,
but we’re going to get as close as we can.&lt;/p&gt;
&lt;!--more--&gt;

&lt;p&gt;If you don’t want to read all the explanation but just want some working code to look at, jump down
to the &lt;a href=&#34;https://www.reedbeta.com/blog/custom-toolchain-with-msbuild/#example-project&#34;&gt;Example Project&lt;/a&gt; section.&lt;/p&gt;
&lt;p&gt;This article is written against Visual Studio 2017, but it may also work in some earlier VSes
(I haven’t tested).&lt;/p&gt;
&lt;div class=&#34;toc&#34;&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/custom-toolchain-with-msbuild/#msbuild&#34;&gt;MSBuild&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/custom-toolchain-with-msbuild/#adding-a-custom-target&#34;&gt;Adding A Custom Target&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/custom-toolchain-with-msbuild/#invoking-the-tool&#34;&gt;Invoking The Tool&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/custom-toolchain-with-msbuild/#incremental-builds&#34;&gt;Incremental Builds&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/custom-toolchain-with-msbuild/#header-dependencies&#34;&gt;Header Dependencies&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/custom-toolchain-with-msbuild/#errorwarning-parsing&#34;&gt;Error/Warning Parsing&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/custom-toolchain-with-msbuild/#example-project&#34;&gt;Example Project&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/custom-toolchain-with-msbuild/#the-next-level&#34;&gt;The Next Level&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;h2 id=&#34;msbuild&#34;&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/custom-toolchain-with-msbuild/#msbuild&#34; title=&#34;Permalink to this section&#34;&gt;MSBuild&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Before we begin, it’s important you understand what we’re getting into. Not to mince words, but
MSBuild is a &lt;a href=&#34;http://wiki.c2.com/?StringlyTyped&#34;&gt;stringly typed&lt;/a&gt;, semi-documented, XML-guzzling,
paradigmatically muddled, cursed hellmaze. However, it &lt;em&gt;does&lt;/em&gt; ship with Visual Studio, so if you
can use it for your custom build steps, then you don’t need to deal with any extra add-ins or
software installs.&lt;/p&gt;
&lt;p&gt;To be fair, MSBuild is &lt;a href=&#34;https://github.com/Microsoft/msbuild&#34;&gt;open-source on GitHub&lt;/a&gt;, so at least
in principle you can dive into it and see what the cursed hellmaze is doing. However, I’ll warn you
up front that many of the most interesting parts vis-à-vis Visual Studio integration are &lt;em&gt;not&lt;/em&gt;
included in the Git repo, but are hidden away in VS’s build extension DLLs. (More about that later.)&lt;/p&gt;
&lt;p&gt;My jumping-off point for this enterprise was &lt;a href=&#34;http://miken-1gam.blogspot.com/2013/01/visual-studio-and-custom-build-rules.html&#34;&gt;this blog post by Mike Nicolella&lt;/a&gt;.
Mike showed how to set up an MSBuild &lt;code&gt;.targets&lt;/code&gt; file to create an association between a specific file
extension in your project, and a build rule (“target”, in MSBuild parlance) to process those files.
We’ll review how that works, then extend it and jazz it up a bit to get some more quality-of-life
features.&lt;/p&gt;
&lt;p&gt;MSBuild docs (such as they are) can be found &lt;a href=&#34;https://docs.microsoft.com/en-us/visualstudio/msbuild/msbuild?view=vs-2017&#34;&gt;on MSDN here&lt;/a&gt;.
Some more information can be gleaned by looking at the C++ build rules installed with Visual
Studio; on my machine they’re in &lt;code&gt;C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\Common7\IDE\VC\VCTargets&lt;/code&gt;.
For example, the file &lt;code&gt;Microsoft.CppCommon.targets&lt;/code&gt; in that directory contains most of the target
definitions for C++ compilation, linking, resources and manifests, and so on.&lt;/p&gt;
&lt;h2 id=&#34;adding-a-custom-target&#34;&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/custom-toolchain-with-msbuild/#adding-a-custom-target&#34; title=&#34;Permalink to this section&#34;&gt;Adding A Custom Target&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;As shown in Mike’s blog post, we can define our own build rule using a couple of XML files which
will be imported into the VS project. (I’ll keep using shader compilation with &lt;code&gt;dxc&lt;/code&gt; as my running
example, but this approach can be adapted for a lot of other things, too.)&lt;/p&gt;
&lt;p&gt;First, create a file &lt;code&gt;dxc.targets&lt;/code&gt;—in your project directory, or really anywhere—containing
the following:&lt;/p&gt;
&lt;div class=&#34;codehilite&#34;&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class=&#34;cp&#34;&gt;&amp;lt;?xml version=&amp;quot;1.0&amp;quot; encoding=&amp;quot;utf-8&amp;quot;?&amp;gt;&lt;/span&gt;
&lt;span class=&#34;nt&#34;&gt;&amp;lt;Project&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;na&#34;&gt;xmlns=&lt;/span&gt;&lt;span class=&#34;s&#34;&gt;&amp;quot;http://schemas.microsoft.com/developer/msbuild/2003&amp;quot;&lt;/span&gt;&lt;span class=&#34;nt&#34;&gt;&amp;gt;&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;  &lt;/span&gt;&lt;span class=&#34;nt&#34;&gt;&amp;lt;ItemGroup&amp;gt;&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;cm&#34;&gt;&amp;lt;!-- Include definitions from dxc.xml, which defines the DXCShader item. --&amp;gt;&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;nt&#34;&gt;&amp;lt;PropertyPageSchema&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;na&#34;&gt;Include=&lt;/span&gt;&lt;span class=&#34;s&#34;&gt;&amp;quot;$(MSBuildThisFileDirectory)dxc.xml&amp;quot;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;nt&#34;&gt;/&amp;gt;&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;cm&#34;&gt;&amp;lt;!-- Hook up DXCShader items to be built by the DXC target. --&amp;gt;&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;nt&#34;&gt;&amp;lt;AvailableItemName&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;na&#34;&gt;Include=&lt;/span&gt;&lt;span class=&#34;s&#34;&gt;&amp;quot;DXCShader&amp;quot;&lt;/span&gt;&lt;span class=&#34;nt&#34;&gt;&amp;gt;&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;      &lt;/span&gt;&lt;span class=&#34;nt&#34;&gt;&amp;lt;Targets&amp;gt;&lt;/span&gt;DXC&lt;span class=&#34;nt&#34;&gt;&amp;lt;/Targets&amp;gt;&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;nt&#34;&gt;&amp;lt;/AvailableItemName&amp;gt;&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;  &lt;/span&gt;&lt;span class=&#34;nt&#34;&gt;&amp;lt;/ItemGroup&amp;gt;&lt;/span&gt;

&lt;span class=&#34;w&#34;&gt;  &lt;/span&gt;&lt;span class=&#34;nt&#34;&gt;&amp;lt;Target&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;na&#34;&gt;Name=&lt;/span&gt;&lt;span class=&#34;s&#34;&gt;&amp;quot;DXC&amp;quot;&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;na&#34;&gt;Condition=&lt;/span&gt;&lt;span class=&#34;s&#34;&gt;&amp;quot;&amp;#39;@(DXCShader)&amp;#39; != &amp;#39;&amp;#39;&amp;quot;&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;na&#34;&gt;BeforeTargets=&lt;/span&gt;&lt;span class=&#34;s&#34;&gt;&amp;quot;ClCompile&amp;quot;&lt;/span&gt;&lt;span class=&#34;nt&#34;&gt;&amp;gt;&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;nt&#34;&gt;&amp;lt;Message&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;na&#34;&gt;Importance=&lt;/span&gt;&lt;span class=&#34;s&#34;&gt;&amp;quot;High&amp;quot;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;na&#34;&gt;Text=&lt;/span&gt;&lt;span class=&#34;s&#34;&gt;&amp;quot;Building shaders!!!&amp;quot;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;nt&#34;&gt;/&amp;gt;&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;  &lt;/span&gt;&lt;span class=&#34;nt&#34;&gt;&amp;lt;/Target&amp;gt;&lt;/span&gt;
&lt;span class=&#34;nt&#34;&gt;&amp;lt;/Project&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;And another file &lt;code&gt;dxc.xml&lt;/code&gt; containing:&lt;/p&gt;
&lt;div class=&#34;codehilite&#34;&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class=&#34;cp&#34;&gt;&amp;lt;?xml version=&amp;quot;1.0&amp;quot; encoding=&amp;quot;utf-8&amp;quot;?&amp;gt;&lt;/span&gt;
&lt;span class=&#34;nt&#34;&gt;&amp;lt;ProjectSchemaDefinitions&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;na&#34;&gt;xmlns=&lt;/span&gt;&lt;span class=&#34;s&#34;&gt;&amp;quot;http://schemas.microsoft.com/build/2009/properties&amp;quot;&lt;/span&gt;&lt;span class=&#34;nt&#34;&gt;&amp;gt;&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;  &lt;/span&gt;&lt;span class=&#34;cm&#34;&gt;&amp;lt;!-- Associate DXCShader item type with .hlsl files --&amp;gt;&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;  &lt;/span&gt;&lt;span class=&#34;nt&#34;&gt;&amp;lt;ItemType&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;na&#34;&gt;Name=&lt;/span&gt;&lt;span class=&#34;s&#34;&gt;&amp;quot;DXCShader&amp;quot;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;na&#34;&gt;DisplayName=&lt;/span&gt;&lt;span class=&#34;s&#34;&gt;&amp;quot;DXC Shader&amp;quot;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;nt&#34;&gt;/&amp;gt;&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;  &lt;/span&gt;&lt;span class=&#34;nt&#34;&gt;&amp;lt;ContentType&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;na&#34;&gt;Name=&lt;/span&gt;&lt;span class=&#34;s&#34;&gt;&amp;quot;DXCShader&amp;quot;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;na&#34;&gt;ItemType=&lt;/span&gt;&lt;span class=&#34;s&#34;&gt;&amp;quot;DXCShader&amp;quot;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;na&#34;&gt;DisplayName=&lt;/span&gt;&lt;span class=&#34;s&#34;&gt;&amp;quot;DXC Shader&amp;quot;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;nt&#34;&gt;/&amp;gt;&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;  &lt;/span&gt;&lt;span class=&#34;nt&#34;&gt;&amp;lt;FileExtension&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;na&#34;&gt;Name=&lt;/span&gt;&lt;span class=&#34;s&#34;&gt;&amp;quot;.hlsl&amp;quot;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;na&#34;&gt;ContentType=&lt;/span&gt;&lt;span class=&#34;s&#34;&gt;&amp;quot;DXCShader&amp;quot;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;nt&#34;&gt;/&amp;gt;&lt;/span&gt;
&lt;span class=&#34;nt&#34;&gt;&amp;lt;/ProjectSchemaDefinitions&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Let’s pause for a moment and take stock of what’s going on here. First, we’re creating a new “item
type”, called &lt;code&gt;DXCShader&lt;/code&gt;, and associating it with the extension &lt;code&gt;.hlsl&lt;/code&gt;. That way, any files we
add to our project with that extension will automatically have this item type applied.&lt;/p&gt;
&lt;p&gt;Second, we’re instructing MSBuild that &lt;code&gt;DXCShader&lt;/code&gt; items are to be built with the &lt;code&gt;DXC&lt;/code&gt; target, and
we’re defining what that target does. For now, all it does is print a message in the build output,
but we’ll get it doing some actual work shortly.&lt;/p&gt;
&lt;p&gt;A few miscellaneous syntax notes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Yes, you need two separate files. No, there’s no way to combine them, AFAICT. This is just the
    way MSBuild works.&lt;/li&gt;
&lt;li&gt;The syntax &lt;code&gt;@(DXCShader)&lt;/code&gt; means “the list of all &lt;code&gt;DXCShader&lt;/code&gt; items in the project”. The &lt;code&gt;Condition&lt;/code&gt;
    attribute on a target says under what conditions that target should execute: if the condition is
    false, the target is skipped. Here, we’re executing the target if the list &lt;code&gt;@(DXCShader)&lt;/code&gt; is non-empty.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;BeforeTargets=&#34;ClCompile&#34;&lt;/code&gt; means this target will run before the &lt;code&gt;ClCompile&lt;/code&gt; target, i.e. before
    C/C++ source files are compiled with &lt;code&gt;cl.exe&lt;/code&gt;. This is because we’re going to output our shader
    bytecode to headers which will get included into C++, so the shader compile step needs to run
    earlier.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Importance=&#34;High&#34;&lt;/code&gt; is needed on the &lt;code&gt;&amp;lt;Message&amp;gt;&lt;/code&gt; task for it to show up in the VS IDE on the
    default verbosity setting. Lower importances will be masked unless you turn up the verbosity.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;To get this into your project, in the VS IDE right-click the project → Build Dependencies… → Build Customizations,
then click “Find Existing” and point it at &lt;code&gt;dxc.targets&lt;/code&gt;. Alternatively, add this line to your &lt;code&gt;.vcxproj&lt;/code&gt;
(as a child of the root &lt;code&gt;&amp;lt;Project&amp;gt;&lt;/code&gt; element, doesn’t matter where):&lt;/p&gt;
&lt;div class=&#34;codehilite&#34;&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class=&#34;nt&#34;&gt;&amp;lt;Import&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;na&#34;&gt;Project=&lt;/span&gt;&lt;span class=&#34;s&#34;&gt;&amp;quot;dxc.targets&amp;quot;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;nt&#34;&gt;/&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Now, if you add a &lt;code&gt;.hlsl&lt;/code&gt; file to your project it should automatically show up as type “DXC Shader”
in the properties; and when you build, you should see the message &lt;code&gt;Building shaders!!!&lt;/code&gt; in the
output.&lt;/p&gt;
&lt;p&gt;Incidentally, in &lt;code&gt;dxc.xml&lt;/code&gt; you can also set up property pages that will show up in the VS IDE on
&lt;code&gt;DXCShader&lt;/code&gt;-type files. This lets you define your own metadata and let users configure it per
file. I haven’t done this, but for example, you could have properties to indicate which shader
stages or profiles the file should be compiled for. The &lt;code&gt;&amp;lt;Target&amp;gt;&lt;/code&gt; element can then have logic that refers
to those properties. Many examples of the XML to define property pages can be found in &lt;code&gt;C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\Common7\IDE\VC\VCTargets\1033&lt;/code&gt;
(or a corresponding location depending on which version of VS you have). For example,
&lt;code&gt;custom_build_tool.xml&lt;/code&gt; in that directory defines the properties for the built-in Custom Build
Tool item type.&lt;/p&gt;
&lt;h2 id=&#34;invoking-the-tool&#34;&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/custom-toolchain-with-msbuild/#invoking-the-tool&#34; title=&#34;Permalink to this section&#34;&gt;Invoking The Tool&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Okay, now it’s time to get our custom target to actually do something. Mike’s blog post used the MSBuild
&lt;a href=&#34;https://docs.microsoft.com/en-us/visualstudio/msbuild/exec-task?view=vs-2017&#34;&gt;&lt;code&gt;&amp;lt;Exec&amp;gt;&lt;/code&gt; task&lt;/a&gt; to
run a command on each source file. However, we’re going to take a different tack and use the
Visual Studio &lt;code&gt;&amp;lt;CustomBuild&amp;gt;&lt;/code&gt; task instead.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;&amp;lt;CustomBuild&amp;gt;&lt;/code&gt; task is the same one that ends up getting executed if you manually set your
files to “Custom Build Tool” and fill in the command/inputs/outputs metadata in the property pages.
But instead of putting that in by hand, we’re going to set up our target to &lt;em&gt;generate&lt;/em&gt; the metadata
and then pass it in to &lt;code&gt;&amp;lt;CustomBuild&amp;gt;&lt;/code&gt;. Doing it this way is going to let us access a couple handy
features later that we wouldn’t get with the plain &lt;code&gt;&amp;lt;Exec&amp;gt;&lt;/code&gt; task.&lt;/p&gt;
&lt;p&gt;Add this inside the DXC &lt;code&gt;&amp;lt;Target&amp;gt;&lt;/code&gt; element:&lt;/p&gt;
&lt;div class=&#34;codehilite&#34;&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class=&#34;cm&#34;&gt;&amp;lt;!-- Setup metadata for custom build tool --&amp;gt;&lt;/span&gt;
&lt;span class=&#34;nt&#34;&gt;&amp;lt;ItemGroup&amp;gt;&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;  &lt;/span&gt;&lt;span class=&#34;nt&#34;&gt;&amp;lt;DXCShader&amp;gt;&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;nt&#34;&gt;&amp;lt;Message&amp;gt;&lt;/span&gt;%(Filename)%(Extension)&lt;span class=&#34;nt&#34;&gt;&amp;lt;/Message&amp;gt;&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;nt&#34;&gt;&amp;lt;Command&amp;gt;&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;      &lt;/span&gt;&amp;quot;$(WDKBinRoot)\x86\dxc.exe&amp;quot;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;-T&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;vs_6_0&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;-E&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;vs_main&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;%(Identity)&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;-Fh&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;%(Filename).vs.h&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;-Vn&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;%(Filename)_vs
&lt;span class=&#34;w&#34;&gt;      &lt;/span&gt;&amp;quot;$(WDKBinRoot)\x86\dxc.exe&amp;quot;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;-T&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;ps_6_0&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;-E&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;ps_main&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;%(Identity)&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;-Fh&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;%(Filename).ps.h&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;-Vn&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;%(Filename)_ps
&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;nt&#34;&gt;&amp;lt;/Command&amp;gt;&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;nt&#34;&gt;&amp;lt;Outputs&amp;gt;&lt;/span&gt;%(Filename).vs.h;%(Filename).ps.h&lt;span class=&#34;nt&#34;&gt;&amp;lt;/Outputs&amp;gt;&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;  &lt;/span&gt;&lt;span class=&#34;nt&#34;&gt;&amp;lt;/DXCShader&amp;gt;&lt;/span&gt;
&lt;span class=&#34;nt&#34;&gt;&amp;lt;/ItemGroup&amp;gt;&lt;/span&gt;

&lt;span class=&#34;cm&#34;&gt;&amp;lt;!-- Compile by forwarding to the Custom Build Tool infrastructure --&amp;gt;&lt;/span&gt;
&lt;span class=&#34;nt&#34;&gt;&amp;lt;CustomBuild&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;na&#34;&gt;Sources=&lt;/span&gt;&lt;span class=&#34;s&#34;&gt;&amp;quot;@(DXCShader)&amp;quot;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;nt&#34;&gt;/&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Now, given some valid HLSL source files in the project, this will invoke &lt;code&gt;dxc.exe&lt;/code&gt; twice on each
one—first compiling a vertex shader, then a pixel shader. The bytecode will be output as C arrays in
header files (&lt;code&gt;-Fh&lt;/code&gt; option). I’ve just put the output headers in the main project directory, but
in production you’d probably want to put them in a subdirectory somewhere.&lt;/p&gt;
&lt;p&gt;Let’s back up and look at the syntax in this snippet. First, the &lt;code&gt;&amp;lt;ItemGroup&amp;gt;&amp;lt;DXCShader&amp;gt;&lt;/code&gt; combo
basically says “iterate over the &lt;code&gt;DXCShader&lt;/code&gt; items”, i.e. the HLSL source files in the project.
Then what we’re doing is adding metadata: each of the child elements—&lt;code&gt;&amp;lt;Message&amp;gt;&lt;/code&gt;, &lt;code&gt;&amp;lt;Command&amp;gt;&lt;/code&gt;,
and &lt;code&gt;&amp;lt;Outputs&amp;gt;&lt;/code&gt;—becomes a metadata key/value pair attached to a &lt;code&gt;DXCShader&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;%(Foo)&lt;/code&gt; syntax accesses item metadata (within a previously established context for “which item”,
which is here created by the iteration over the shaders). All MSBuild items have certain
&lt;a href=&#34;https://docs.microsoft.com/en-us/visualstudio/msbuild/msbuild-well-known-item-metadata?view=vs-2017&#34;&gt;built-in metadata&lt;/a&gt;
like path, filename, and extension; we’re building on those to construct additional
metadata, in the format expected by the &lt;code&gt;&amp;lt;CustomBuild&amp;gt;&lt;/code&gt; task. (It matches the metadata that would be
created if you set up the command line etc. manually in the Custom Build Tool property pages.)&lt;/p&gt;
&lt;p&gt;Incidentally, the &lt;code&gt;$(WDKBinRoot)&lt;/code&gt; variable (“property”, in MSBuild-ese) is the path to the Windows
SDK &lt;code&gt;bin&lt;/code&gt; folder, where lots of tools like &lt;code&gt;dxc&lt;/code&gt; live. It needs to be quoted because it can (and
usually does) contain spaces. You can find out these things by running MSBuild with “diagnostic”
verbosity (in VS, go to Tools → Options → Projects and Solutions → Build and Run → “MSBuild project
build output verbosity”)—this will spit out all the defined properties plus a ton of logging about
which targets are running and what they’re doing.&lt;/p&gt;
&lt;p&gt;Finally, after setting up all the required metadata, we simply pass it to the &lt;code&gt;&amp;lt;CustomBuild&amp;gt;&lt;/code&gt; task.
(This task isn’t part of core MSBuild, but is defined in &lt;code&gt;Microsoft.Build.CPPTasks.Common.dll&lt;/code&gt;—an
extension plugin to MSBuild that comes with Visual Studio.) Again we see the &lt;code&gt;@(DXCShader)&lt;/code&gt; syntax,
meaning to pass in the list of all &lt;code&gt;DXCShader&lt;/code&gt; items in the project. Internally, &lt;code&gt;&amp;lt;CustomBuild&amp;gt;&lt;/code&gt;
iterates over it and invokes your specified command lines.&lt;/p&gt;
&lt;h2 id=&#34;incremental-builds&#34;&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/custom-toolchain-with-msbuild/#incremental-builds&#34; title=&#34;Permalink to this section&#34;&gt;Incremental Builds&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;At this point, we have a working custom build! We can simply add &lt;code&gt;.hlsl&lt;/code&gt; files to our project, and
they’ll automatically be compiled by &lt;code&gt;dxc&lt;/code&gt; as part of the build process, without us having to do
anything. &lt;em&gt;Hurrah!&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;However, while working with this setup you will notice a couple of problems.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;When you modify an HLSL source file, Visual Studio will &lt;em&gt;not&lt;/em&gt; reliably detect that it
    needs to recompile it. If the project was up-to-date before, hitting Build will do nothing!
    However, if you have also modified something else (such as a C++ source file), &lt;em&gt;then&lt;/em&gt; the build
    will pick up the shaders in addition.&lt;/li&gt;
&lt;li&gt;Anytime anything else gets built, &lt;em&gt;all&lt;/em&gt; the shaders get built. In other words, MSBuild doesn’t
    yet understand that if an individual shader is already up-to-date then it can be skipped.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Fortunately, we can easily fix these. But first, why are these problems happening at all?&lt;/p&gt;
&lt;p&gt;VS and MSBuild depend on &lt;a href=&#34;https://docs.microsoft.com/en-us/visualstudio/extensibility/visual-cpp-project-extensibility?view=vs-2017#tlog-files&#34;&gt;&lt;code&gt;.tlog&lt;/code&gt; (tracker log) files&lt;/a&gt;
to cache information about source file dependencies and efficiently determine whether a build is
up-to-date. Somewhere inside your build output directory there will be a folder full of these logs,
listing what source files have gotten built, what inputs they depended on (e.g. headers), and
what outputs they generated (e.g. object files). The problem is that our custom target isn’t
producing any &lt;code&gt;.tlog&lt;/code&gt;s.&lt;/p&gt;
&lt;p&gt;Conveniently for us, the &lt;code&gt;&amp;lt;CustomBuild&amp;gt;&lt;/code&gt; task supports &lt;code&gt;.tlog&lt;/code&gt; handling right out of the box; we
just have to turn it on! Change the &lt;code&gt;&amp;lt;CustomBuild&amp;gt;&lt;/code&gt; invocation in the targets file to this:&lt;/p&gt;
&lt;div class=&#34;codehilite&#34;&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class=&#34;cm&#34;&gt;&amp;lt;!-- Compile by forwarding to the Custom Build Tool infrastructure,&lt;/span&gt;
&lt;span class=&#34;cm&#34;&gt;     so it will take care of .tlogs --&amp;gt;&lt;/span&gt;
&lt;span class=&#34;nt&#34;&gt;&amp;lt;CustomBuild&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;  &lt;/span&gt;&lt;span class=&#34;na&#34;&gt;Sources=&lt;/span&gt;&lt;span class=&#34;s&#34;&gt;&amp;quot;@(DXCShader)&amp;quot;&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;  &lt;/span&gt;&lt;span class=&#34;na&#34;&gt;MinimalRebuildFromTracking=&lt;/span&gt;&lt;span class=&#34;s&#34;&gt;&amp;quot;true&amp;quot;&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;  &lt;/span&gt;&lt;span class=&#34;na&#34;&gt;TrackerLogDirectory=&lt;/span&gt;&lt;span class=&#34;s&#34;&gt;&amp;quot;$(TLogLocation)&amp;quot;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;nt&#34;&gt;/&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;That’s all there is to it—now, modified HLSL files will be properly detected and rebuilt, and
&lt;em&gt;unmodified&lt;/em&gt; ones will be properly detected and &lt;em&gt;not&lt;/em&gt; rebuilt. This also takes care of deleting the
previous output files when you do a clean build. This is one reason to prefer using the &lt;code&gt;&amp;lt;CustomBuild&amp;gt;&lt;/code&gt;
task rather than the simpler &lt;code&gt;&amp;lt;Exec&amp;gt;&lt;/code&gt; task (we’ll see another reason a bit later).&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Thanks to Olga Arkhipova at Microsoft for helping me figure out this part!&lt;/em&gt;&lt;/p&gt;
&lt;h2 id=&#34;header-dependencies&#34;&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/custom-toolchain-with-msbuild/#header-dependencies&#34; title=&#34;Permalink to this section&#34;&gt;Header Dependencies&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Now that we have dependencies hooked up for our custom toolchain, a logical next step is to look
into how we can specify extra input dependencies—so that our shaders can have &lt;code&gt;#include&lt;/code&gt;s, for
example, and modifications to the headers will automatically trigger rebuilds properly.&lt;/p&gt;
&lt;p&gt;The good news is that yes, we can do this by adding an &lt;code&gt;&amp;lt;AdditionalInputs&amp;gt;&lt;/code&gt; metadata key to our
&lt;code&gt;DXCShader&lt;/code&gt; items. Files listed there will get registered as inputs in the &lt;code&gt;.tlog&lt;/code&gt;, and the build
system will do the rest. The bad news is that there doesn’t seem to be an easy way to detect &lt;em&gt;on
a file-by-file level&lt;/em&gt; which additional inputs are needed.&lt;/p&gt;
&lt;p&gt;This is frustrating because Visual Studio actually includes a utility for tracking
file accesses in an external tool! It’s called &lt;code&gt;tracker.exe&lt;/code&gt; and lives somewhere in your VS
installation. You give it a command line, and it’ll detect all files opened for reading by the
launched process (presumably by injecting a DLL and detouring &lt;code&gt;CreateFile()&lt;/code&gt;, or something along
those lines). I believe this is what VS uses internally to track &lt;code&gt;#include&lt;/code&gt;s for C++—and it
would be perfect if we could get access to the same functionality for custom toolchains as well.&lt;/p&gt;
&lt;p&gt;Unfortunately, the &lt;code&gt;&amp;lt;CustomBuild&amp;gt;&lt;/code&gt; task &lt;em&gt;explicitly disables&lt;/em&gt; this tracking functionality. I was
able to find this out by using &lt;a href=&#34;https://github.com/icsharpcode/ILSpy&#34;&gt;ILSpy&lt;/a&gt; to decompile the
&lt;code&gt;Microsoft.Build.CPPTasks.Common.dll&lt;/code&gt;. It’s a .NET assembly, so it decompiles pretty cleanly, and
you can examine the innards of the &lt;code&gt;CustomBuild&lt;/code&gt; class. It contains this snippet, in the
&lt;code&gt;ExecuteTool()&lt;/code&gt; method:&lt;/p&gt;
&lt;div class=&#34;codehilite&#34;&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class=&#34;kt&#34;&gt;bool&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;trackFileAccess&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;base&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;TrackFileAccess&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;span class=&#34;k&#34;&gt;base&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;TrackFileAccess&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;false&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;span class=&#34;n&#34;&gt;num&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;base&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;TrackerExecuteTool&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;pathToTool2&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;responseFileCommands&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;commandLineCommands&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;);&lt;/span&gt;
&lt;span class=&#34;k&#34;&gt;base&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;TrackFileAccess&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;trackFileAccess&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;That is, it’s turning off file access tracking before calling the base class
method that would otherwise invoke the tracker. I’m sure there’s a reason why they did that, but
sadly it’s stymied my attempts to get automatic &lt;code&gt;#include&lt;/code&gt; tracking to work for shaders.&lt;/p&gt;
&lt;p&gt;(We could also invoke &lt;code&gt;tracker.exe&lt;/code&gt; manually in our command line, but then we face the problem of
merging the tracker-generated &lt;code&gt;.tlog&lt;/code&gt; into that of the &lt;code&gt;&amp;lt;CustomBuild&amp;gt;&lt;/code&gt; task. They’re just text files,
so it’s potentially doable…but that is &lt;em&gt;way&lt;/em&gt; more programming than I’m prepared to attempt in an
XML-based scripting language.)&lt;/p&gt;
&lt;p&gt;Although we can’t get fine-grained file-by-file header dependencies, we can still set up &lt;em&gt;conservative&lt;/em&gt;
dependencies by making every HLSL source file depend on every header. This will result in rebuilding
all the shaders whenever any header is modified—but better to rebuild too much than not enough.
We can find all the headers using a wildcard pattern and an &lt;code&gt;&amp;lt;ItemGroup&amp;gt;&lt;/code&gt;. Add this to the DXC
&lt;code&gt;&amp;lt;Target&amp;gt;&lt;/code&gt;, before the “setup metadata” section:&lt;/p&gt;
&lt;div class=&#34;codehilite&#34;&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class=&#34;cm&#34;&gt;&amp;lt;!-- Find all shader headers (.hlsli files) --&amp;gt;&lt;/span&gt;
&lt;span class=&#34;nt&#34;&gt;&amp;lt;ItemGroup&amp;gt;&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;  &lt;/span&gt;&lt;span class=&#34;nt&#34;&gt;&amp;lt;ShaderHeader&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;na&#34;&gt;Include=&lt;/span&gt;&lt;span class=&#34;s&#34;&gt;&amp;quot;*.hlsli&amp;quot;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;nt&#34;&gt;/&amp;gt;&lt;/span&gt;
&lt;span class=&#34;nt&#34;&gt;&amp;lt;/ItemGroup&amp;gt;&lt;/span&gt;
&lt;span class=&#34;nt&#34;&gt;&amp;lt;PropertyGroup&amp;gt;&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;  &lt;/span&gt;&lt;span class=&#34;nt&#34;&gt;&amp;lt;ShaderHeaders&amp;gt;&lt;/span&gt;@(ShaderHeader)&lt;span class=&#34;nt&#34;&gt;&amp;lt;/ShaderHeaders&amp;gt;&lt;/span&gt;
&lt;span class=&#34;nt&#34;&gt;&amp;lt;/PropertyGroup&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;You could also set this to find &lt;code&gt;.h&lt;/code&gt; files under a &lt;code&gt;Shaders&lt;/code&gt; subdirectory, or whatever you prefer.
The &lt;code&gt;**&lt;/code&gt; wildcard is available for recursively searching subdirectories, too.&lt;/p&gt;
&lt;p&gt;Then add this inside the &lt;code&gt;&amp;lt;ItemGroup&amp;gt;&amp;lt;DXCShader&amp;gt;&lt;/code&gt; section:&lt;/p&gt;
&lt;div class=&#34;codehilite&#34;&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class=&#34;nt&#34;&gt;&amp;lt;AdditionalInputs&amp;gt;&lt;/span&gt;$(ShaderHeaders)&lt;span class=&#34;nt&#34;&gt;&amp;lt;/AdditionalInputs&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;We have to do a little dance here, first forming the &lt;code&gt;ShaderHeader&lt;/code&gt; item list, then expanding it
into the &lt;code&gt;ShaderHeaders&lt;/code&gt; &lt;em&gt;property&lt;/em&gt;, and finally referencing that in the metadata. I’m not sure why,
but if I try to use &lt;code&gt;@(ShaderHeader)&lt;/code&gt; directly in the metadata it just comes out blank. Perhaps
it’s not allowed to have nested iteration over item lists in MSBuild.&lt;/p&gt;
&lt;p&gt;In any case, after making these changes and rebuilding, the build should now pick up any changes to
shader headers. &lt;em&gt;Woohoo!&lt;/em&gt;&lt;/p&gt;
&lt;h2 id=&#34;errorwarning-parsing&#34;&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/custom-toolchain-with-msbuild/#errorwarning-parsing&#34; title=&#34;Permalink to this section&#34;&gt;Error/Warning Parsing&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;There’s just one more bit of sparkle we can easily add. When you compile C++ and you get an error
or warning, the VS IDE recognizes it and produces a clickable link that takes you to the source
location. If a custom build step emits error messages in the same format, they’ll be picked up as
well—but what if your custom toolchain has a different format?&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;dxc&lt;/code&gt; compiler emits errors and warnings in gcc/clang format, looking something like this:&lt;/p&gt;
&lt;div class=&#34;codehilite&#34;&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;Shader.hlsl:12:15: error: cannot convert from &amp;#39;float3&amp;#39; to &amp;#39;float4&amp;#39;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;It turns out that Visual Studio already does recognize this format (at least as of version 15.9),
which is great! But if it didn’t, or in case you’ve got a tool with some other message format, it turns
out you can provide a regular expression to find errors and warnings in the tool output. The regex
can even supply source file/line information, and the errors will become clickable in the IDE, just
as with C++. (This is all &lt;em&gt;totally undocumented&lt;/em&gt; and I only know about it because I spotted the
code while browsing through the decompiled CPPTasks DLL. If you want to take a look for yourself,
the juicy bit is the &lt;code&gt;VCToolTask.ParseLine()&lt;/code&gt; method.)&lt;/p&gt;
&lt;p&gt;This will use &lt;a href=&#34;https://docs.microsoft.com/en-us/dotnet/standard/base-types/regular-expression-language-quick-reference&#34;&gt;.NET regex syntax&lt;/a&gt;,
and in particular, expects a certain set of &lt;a href=&#34;https://docs.microsoft.com/en-us/dotnet/standard/base-types/grouping-constructs-in-regular-expressions#named_matched_subexpression&#34;&gt;named captures&lt;/a&gt;
to provide metadata. By way of example, here’s the regex I wrote for gcc/clang-format errors:&lt;/p&gt;
&lt;div class=&#34;codehilite&#34;&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;(?&amp;#39;FILENAME&amp;#39;.+):(?&amp;#39;LINE&amp;#39;\d+):(?&amp;#39;COLUMN&amp;#39;\d+): (?&amp;#39;CATEGORY&amp;#39;error|warning): (?&amp;#39;TEXT&amp;#39;.*)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;code&gt;FILENAME&lt;/code&gt;, &lt;code&gt;LINE&lt;/code&gt;, etc. are the names the parsing code expects for the metadata. There’s one more
I didn’t use: &lt;code&gt;CODE&lt;/code&gt;, for an error code (like &lt;a href=&#34;https://docs.microsoft.com/en-us/cpp/error-messages/compiler-errors-1/compiler-error-c2440?view=vs-2017&#34;&gt;C2440&lt;/a&gt;,
etc.). The only required one is &lt;code&gt;CATEGORY&lt;/code&gt;, without which the message won’t be clickable (and it
must be one of the words “error”, “warning”, or “note”); all the others are optional.&lt;/p&gt;
&lt;p&gt;To use it, pass the regex to the &lt;code&gt;&amp;lt;CustomBuild&amp;gt;&lt;/code&gt; task like so:&lt;/p&gt;
&lt;div class=&#34;codehilite&#34;&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class=&#34;nt&#34;&gt;&amp;lt;CustomBuild&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;  &lt;/span&gt;&lt;span class=&#34;na&#34;&gt;Sources=&lt;/span&gt;&lt;span class=&#34;s&#34;&gt;&amp;quot;@(DXCShader)&amp;quot;&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;  &lt;/span&gt;&lt;span class=&#34;na&#34;&gt;MinimalRebuildFromTracking=&lt;/span&gt;&lt;span class=&#34;s&#34;&gt;&amp;quot;true&amp;quot;&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;  &lt;/span&gt;&lt;span class=&#34;na&#34;&gt;TrackerLogDirectory=&lt;/span&gt;&lt;span class=&#34;s&#34;&gt;&amp;quot;$(TLogLocation)&amp;quot;&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;  &lt;/span&gt;&lt;span class=&#34;na&#34;&gt;ErrorListRegex=&lt;/span&gt;&lt;span class=&#34;s&#34;&gt;&amp;quot;(?&amp;#39;FILENAME&amp;#39;.+):(?&amp;#39;LINE&amp;#39;\d+):(?&amp;#39;COLUMN&amp;#39;\d+): (?&amp;#39;CATEGORY&amp;#39;error|warning): (?&amp;#39;TEXT&amp;#39;.*)&amp;quot;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;nt&#34;&gt;/&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h2 id=&#34;example-project&#34;&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/custom-toolchain-with-msbuild/#example-project&#34; title=&#34;Permalink to this section&#34;&gt;Example Project&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Here’s a complete VS2017 project with all the features we’ve discussed, a couple demo shaders, and a
C++ file that includes the compiled bytecode (just to show that works).&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;biglink&#34; href=&#34;https://www.reedbeta.com/blog/custom-toolchain-with-msbuild/buildcust3.zip&#34;&gt;Download Example Project (.zip, 4.3 KB)&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;And for completeness, here’s the final contents of &lt;code&gt;dxc.targets&lt;/code&gt;:&lt;/p&gt;
&lt;div class=&#34;codehilite&#34;&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class=&#34;cp&#34;&gt;&amp;lt;?xml version=&amp;quot;1.0&amp;quot; encoding=&amp;quot;utf-8&amp;quot;?&amp;gt;&lt;/span&gt;
&lt;span class=&#34;nt&#34;&gt;&amp;lt;Project&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;na&#34;&gt;xmlns=&lt;/span&gt;&lt;span class=&#34;s&#34;&gt;&amp;quot;http://schemas.microsoft.com/developer/msbuild/2003&amp;quot;&lt;/span&gt;&lt;span class=&#34;nt&#34;&gt;&amp;gt;&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;  &lt;/span&gt;&lt;span class=&#34;nt&#34;&gt;&amp;lt;ItemGroup&amp;gt;&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;cm&#34;&gt;&amp;lt;!-- Include definitions from dxc.xml, which defines the DXCShader item. --&amp;gt;&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;nt&#34;&gt;&amp;lt;PropertyPageSchema&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;na&#34;&gt;Include=&lt;/span&gt;&lt;span class=&#34;s&#34;&gt;&amp;quot;$(MSBuildThisFileDirectory)dxc.xml&amp;quot;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;nt&#34;&gt;/&amp;gt;&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;cm&#34;&gt;&amp;lt;!-- Hook up DXCShader items to be built by the DXC target. --&amp;gt;&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;nt&#34;&gt;&amp;lt;AvailableItemName&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;na&#34;&gt;Include=&lt;/span&gt;&lt;span class=&#34;s&#34;&gt;&amp;quot;DXCShader&amp;quot;&lt;/span&gt;&lt;span class=&#34;nt&#34;&gt;&amp;gt;&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;      &lt;/span&gt;&lt;span class=&#34;nt&#34;&gt;&amp;lt;Targets&amp;gt;&lt;/span&gt;DXC&lt;span class=&#34;nt&#34;&gt;&amp;lt;/Targets&amp;gt;&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;nt&#34;&gt;&amp;lt;/AvailableItemName&amp;gt;&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;  &lt;/span&gt;&lt;span class=&#34;nt&#34;&gt;&amp;lt;/ItemGroup&amp;gt;&lt;/span&gt;

&lt;span class=&#34;w&#34;&gt;  &lt;/span&gt;&lt;span class=&#34;nt&#34;&gt;&amp;lt;Target&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;na&#34;&gt;Name=&lt;/span&gt;&lt;span class=&#34;s&#34;&gt;&amp;quot;DXC&amp;quot;&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;na&#34;&gt;Condition=&lt;/span&gt;&lt;span class=&#34;s&#34;&gt;&amp;quot;&amp;#39;@(DXCShader)&amp;#39; != &amp;#39;&amp;#39;&amp;quot;&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;na&#34;&gt;BeforeTargets=&lt;/span&gt;&lt;span class=&#34;s&#34;&gt;&amp;quot;ClCompile&amp;quot;&lt;/span&gt;&lt;span class=&#34;nt&#34;&gt;&amp;gt;&lt;/span&gt;

&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;nt&#34;&gt;&amp;lt;Message&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;na&#34;&gt;Importance=&lt;/span&gt;&lt;span class=&#34;s&#34;&gt;&amp;quot;High&amp;quot;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;na&#34;&gt;Text=&lt;/span&gt;&lt;span class=&#34;s&#34;&gt;&amp;quot;Building shaders!!!&amp;quot;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;nt&#34;&gt;/&amp;gt;&lt;/span&gt;

&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;cm&#34;&gt;&amp;lt;!-- Find all shader headers (.hlsli files) --&amp;gt;&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;nt&#34;&gt;&amp;lt;ItemGroup&amp;gt;&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;      &lt;/span&gt;&lt;span class=&#34;nt&#34;&gt;&amp;lt;ShaderHeader&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;na&#34;&gt;Include=&lt;/span&gt;&lt;span class=&#34;s&#34;&gt;&amp;quot;*.hlsli&amp;quot;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;nt&#34;&gt;/&amp;gt;&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;nt&#34;&gt;&amp;lt;/ItemGroup&amp;gt;&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;nt&#34;&gt;&amp;lt;PropertyGroup&amp;gt;&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;      &lt;/span&gt;&lt;span class=&#34;nt&#34;&gt;&amp;lt;ShaderHeaders&amp;gt;&lt;/span&gt;@(ShaderHeader)&lt;span class=&#34;nt&#34;&gt;&amp;lt;/ShaderHeaders&amp;gt;&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;nt&#34;&gt;&amp;lt;/PropertyGroup&amp;gt;&lt;/span&gt;

&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;cm&#34;&gt;&amp;lt;!-- Setup metadata for custom build tool --&amp;gt;&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;nt&#34;&gt;&amp;lt;ItemGroup&amp;gt;&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;      &lt;/span&gt;&lt;span class=&#34;nt&#34;&gt;&amp;lt;DXCShader&amp;gt;&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;        &lt;/span&gt;&lt;span class=&#34;nt&#34;&gt;&amp;lt;Message&amp;gt;&lt;/span&gt;%(Filename)%(Extension)&lt;span class=&#34;nt&#34;&gt;&amp;lt;/Message&amp;gt;&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;        &lt;/span&gt;&lt;span class=&#34;nt&#34;&gt;&amp;lt;Command&amp;gt;&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;          &lt;/span&gt;&amp;quot;$(WDKBinRoot)\x86\dxc.exe&amp;quot;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;-T&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;vs_6_0&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;-E&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;vs_main&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;%(Identity)&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;-Fh&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;%(Filename).vs.h&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;-Vn&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;%(Filename)_vs
&lt;span class=&#34;w&#34;&gt;          &lt;/span&gt;&amp;quot;$(WDKBinRoot)\x86\dxc.exe&amp;quot;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;-T&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;ps_6_0&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;-E&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;ps_main&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;%(Identity)&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;-Fh&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;%(Filename).ps.h&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;-Vn&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;%(Filename)_ps
&lt;span class=&#34;w&#34;&gt;        &lt;/span&gt;&lt;span class=&#34;nt&#34;&gt;&amp;lt;/Command&amp;gt;&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;        &lt;/span&gt;&lt;span class=&#34;nt&#34;&gt;&amp;lt;AdditionalInputs&amp;gt;&lt;/span&gt;$(ShaderHeaders)&lt;span class=&#34;nt&#34;&gt;&amp;lt;/AdditionalInputs&amp;gt;&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;        &lt;/span&gt;&lt;span class=&#34;nt&#34;&gt;&amp;lt;Outputs&amp;gt;&lt;/span&gt;%(Filename).vs.h;%(Filename).ps.h&lt;span class=&#34;nt&#34;&gt;&amp;lt;/Outputs&amp;gt;&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;      &lt;/span&gt;&lt;span class=&#34;nt&#34;&gt;&amp;lt;/DXCShader&amp;gt;&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;nt&#34;&gt;&amp;lt;/ItemGroup&amp;gt;&lt;/span&gt;

&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;cm&#34;&gt;&amp;lt;!-- Compile by forwarding to the Custom Build Tool infrastructure,&lt;/span&gt;
&lt;span class=&#34;cm&#34;&gt;         so it will take care of .tlogs and error/warning parsing --&amp;gt;&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;nt&#34;&gt;&amp;lt;CustomBuild&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;      &lt;/span&gt;&lt;span class=&#34;na&#34;&gt;Sources=&lt;/span&gt;&lt;span class=&#34;s&#34;&gt;&amp;quot;@(DXCShader)&amp;quot;&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;      &lt;/span&gt;&lt;span class=&#34;na&#34;&gt;MinimalRebuildFromTracking=&lt;/span&gt;&lt;span class=&#34;s&#34;&gt;&amp;quot;true&amp;quot;&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;      &lt;/span&gt;&lt;span class=&#34;na&#34;&gt;TrackerLogDirectory=&lt;/span&gt;&lt;span class=&#34;s&#34;&gt;&amp;quot;$(TLogLocation)&amp;quot;&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;      &lt;/span&gt;&lt;span class=&#34;na&#34;&gt;ErrorListRegex=&lt;/span&gt;&lt;span class=&#34;s&#34;&gt;&amp;quot;(?&amp;#39;FILENAME&amp;#39;.+):(?&amp;#39;LINE&amp;#39;\d+):(?&amp;#39;COLUMN&amp;#39;\d+): (?&amp;#39;CATEGORY&amp;#39;error|warning): (?&amp;#39;TEXT&amp;#39;.*)&amp;quot;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;nt&#34;&gt;/&amp;gt;&lt;/span&gt;
&lt;span class=&#34;w&#34;&gt;  &lt;/span&gt;&lt;span class=&#34;nt&#34;&gt;&amp;lt;/Target&amp;gt;&lt;/span&gt;
&lt;span class=&#34;nt&#34;&gt;&amp;lt;/Project&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h2 id=&#34;the-next-level&#34;&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/custom-toolchain-with-msbuild/#the-next-level&#34; title=&#34;Permalink to this section&#34;&gt;The Next Level&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;At this point, we have a pretty usable MSBuild customization for compiling shaders, or using other
kinds of custom toolchains! I’m pretty happy with it. However, there’s still a couple of areas for
improvement.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;As mentioned before, I’d like to get file access tracking to work so we can have exact
    dependencies for included files, rather than conservative (overly broad) dependencies.&lt;/li&gt;
&lt;li&gt;I haven’t done anything with parallel building. Currently, &lt;code&gt;&amp;lt;CustomBuild&amp;gt;&lt;/code&gt; tasks are run one at a
    time. There &lt;em&gt;is&lt;/em&gt; a &lt;code&gt;&amp;lt;ParallelCustomBuild&amp;gt;&lt;/code&gt; task in the CPPTasks assembly…unfortunately, it
    doesn’t support &lt;code&gt;.tlog&lt;/code&gt; updating or the error/warning regex, so it’s not directly usable here.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;To obtain these features, I think I’d need to write my own build extension in C#, defining a custom
task and calling it in place of &lt;code&gt;&amp;lt;CustomBuild&amp;gt;&lt;/code&gt; in the targets file. It might not be too hard to get
that working, but I haven’t attempted it yet.&lt;/p&gt;
&lt;p&gt;In the meantime, now that the hard work of circumventing the weird gotchas and reverse-engineering
the undocumented innards has been done, it should be pretty easy to adapt this &lt;code&gt;.targets&lt;/code&gt; setup to
other needs for code generation or external tools, and have them act mostly like first-class
citizens in our Visual Studio builds. Cheers!&lt;/p&gt;</description>
			</item>
			<item>
				<title>Mesh Shader Possibilities</title>
				<link>https://www.reedbeta.com/blog/mesh-shader-possibilities/</link>
				<guid>http://reedbeta.com/blog/mesh-shader-possibilities/</guid>
				<dc:creator>Nathan Reed</dc:creator>
<pubDate>Sat, 29 Sep 2018 11:42:26 -0700</pubDate><comments>https://www.reedbeta.com/blog/mesh-shader-possibilities/#comments</comments>					<category>Coding</category>
					<category>GPU</category>
					<category>Graphics</category>
				<description>&lt;p&gt;NVIDIA recently announced their latest GPU architecture, called Turing. Although its headlining feature is
&lt;a href=&#34;https://arstechnica.com/gadgets/2018/08/microsoft-announces-the-next-step-in-gaming-graphics-directx-raytracing/&#34;&gt;hardware-accelerated ray tracing&lt;/a&gt;,
Turing also includes &lt;a href=&#34;https://devblogs.nvidia.com/nvidia-turing-architecture-in-depth/&#34;&gt;several other developments&lt;/a&gt;
that look quite intriguing in their own right.&lt;/p&gt;
&lt;p&gt;One of these is the new concept of &lt;a href=&#34;https://devblogs.nvidia.com/introduction-turing-mesh-shaders/&#34;&gt;&lt;em&gt;mesh shaders&lt;/em&gt;&lt;/a&gt;,
details of which dropped a couple weeks ago—and the graphics programming community was agog, with many
enthusiastic discussions taking place on Twitter and elsewhere. So what are mesh shaders (and their
counterparts, task shaders), why are graphics programmers so excited about them, and what might we
be able to do with them?&lt;/p&gt;
&lt;!--more--&gt;

&lt;h2 id=&#34;the-gpu-geometry-pipeline-has-gotten-cluttered&#34;&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/mesh-shader-possibilities/#the-gpu-geometry-pipeline-has-gotten-cluttered&#34; title=&#34;Permalink to this section&#34;&gt;The GPU Geometry Pipeline Has Gotten Cluttered&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The process of submitting geometry—triangles to be drawn—to the GPU has a simple underlying
paradigm: you put your vertices into a buffer, point the GPU at it, and issue a draw call to say
how many primitives to render. The vertices get slurped linearly out of the buffer, each is
processed by a vertex shader, the triangles are rasterized and shaded, and Bob’s your uncle.&lt;/p&gt;
&lt;p&gt;But over decades of GPU development, various extra features have gotten bolted onto this basic pipeline
in the name of greater performance and efficiency. Indexed triangles and vertex caches were created to exploit
vertex reuse. Complex vertex stream format descriptions are needed to prepare data for shading.
Instancing, and later multi-draw, allowed certain sets of draw calls to be combined together;
indirect draws could be generated on the GPU itself. Then came
the extra shader stages: geometry shaders, to allow programmable operations on primitives and even
inserting or deleting primitives on the fly, and then tessellation shaders, letting you submit a
low-res mesh and dynamically subdivide it to a programmable level.&lt;/p&gt;
&lt;p&gt;While these features and more were all added for good reasons (or at least what &lt;em&gt;seemed&lt;/em&gt; like
good reasons at the time), the compound of all of them has become unwieldy. Which subset of the
many available options do you reach for in a given situation? Will your choice be efficient across
all the GPU architectures your software must run on?&lt;/p&gt;
&lt;p&gt;Moreover, this elaborate pipeline is still not as flexible as we would sometimes like—or, where
flexible, it is not performant. Instancing can only draw copies of a single mesh at a time;
multi-draw is still inefficient for large numbers of small draws. Geometry shaders’ programming model is &lt;a href=&#34;http://www.joshbarczak.com/blog/?p=667&#34;&gt;not
conducive to efficient implementation&lt;/a&gt; on wide SIMD cores in
GPUs, and its &lt;a href=&#34;https://fgiesen.wordpress.com/2011/07/20/a-trip-through-the-graphics-pipeline-2011-part-10/&#34;&gt;input/output buffering presents difficulties too&lt;/a&gt;.
Hardware tessellation, though very handy for certain things, is often &lt;a href=&#34;https://www.sebastiansylvan.com/post/the-problem-with-tessellation-in-directx-11/&#34;&gt;difficult to use well&lt;/a&gt;
due to the limited granularity at which you can set tessellation factors, the limited set of baked-in
&lt;a href=&#34;/blog/tess-quick-ref/&#34;&gt;tessellation modes&lt;/a&gt;, and performance issues on some GPU architectures.&lt;/p&gt;
&lt;h2 id=&#34;simplicity-is-golden&#34;&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/mesh-shader-possibilities/#simplicity-is-golden&#34; title=&#34;Permalink to this section&#34;&gt;Simplicity Is Golden&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Mesh shaders represent a radical simplification of the geometry pipeline. With a mesh shader
enabled, all the shader stages and fixed-function features described above are swept away. Instead, we get
a clean, straightforward pipeline using a compute-shader-like programming model. Importantly, this
new pipeline is both highly flexible—enough to handle the existing geometry tasks in a typical game,
plus enable new techniques that are challenging to do on the GPU today—&lt;em&gt;and&lt;/em&gt; it looks
like it should be quite performance-friendly, with no apparent architectural barriers to efficient
GPU execution.&lt;/p&gt;
&lt;p&gt;Like a compute shader, a mesh shader defines work groups of parallel-running threads, and they can
communicate via on-chip shared memory as well as &lt;a href=&#34;http://32ipi028l5q82yhj72224m8j.wpengine.netdna-cdn.com/wp-content/uploads/2017/07/GDC2017-Wave-Programming-D3D12-Vulkan.pdf&#34;&gt;wave intrinsics&lt;/a&gt;.
In lieu of a draw call, the app launches some number of mesh shader work groups. Each work group
is responsible for writing out a small, self-contained chunk of geometry, called a
“meshlet”, expressed in arrays of vertex attributes and corresponding indices. These meshlets
then get tossed directly into the rasterizer, and Bob’s your uncle.&lt;/p&gt;
&lt;p&gt;(More details can be found in &lt;a href=&#34;https://devblogs.nvidia.com/introduction-turing-mesh-shaders/&#34;&gt;NVIDIA’s blog post&lt;/a&gt;,
a &lt;a href=&#34;http://on-demand.gputechconf.com/siggraph/2018/video/sig1811-3-christoph-kubisch-mesh-shaders.html&#34;&gt;talk by Christoph Kubisch&lt;/a&gt;,
and the &lt;a href=&#34;https://www.khronos.org/registry/OpenGL/extensions/NV/NV_mesh_shader.txt&#34;&gt;OpenGL extension spec&lt;/a&gt;.)&lt;/p&gt;
&lt;p&gt;The appealing thing about this model is how data-driven and freeform it is. The mesh shader pipeline
has very relaxed expectations about the shape of your data and the kinds of things you’re doing to do.
Everything’s up to the programmer: you can pull the vertex and index data from buffers, generate
them algorithmically, or any combination.&lt;/p&gt;
&lt;p&gt;At the same time, the mesh shader model sidesteps the issues that hampered geometry shaders, by explicitly embracing
SIMD execution (in the form of the compute “work group” abstraction). Instead of each shader &lt;em&gt;thread&lt;/em&gt;
generating geometry on its own—which leads to divergence, and large input/output data sizes—we
have the whole work group outputting a meshlet cooperatively. This mean we can use
compute-style tricks, like: first do some work on the vertices in parallel, then have a barrier, then work on
the triangles in parallel. It also means the input/output bandwidth needs are a lot more reasonable.
And, because meshlets are indexed triangle lists, they don’t break vertex reuse, as geometry shaders often did.&lt;/p&gt;
&lt;h2 id=&#34;an-upgrade-path&#34;&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/mesh-shader-possibilities/#an-upgrade-path&#34; title=&#34;Permalink to this section&#34;&gt;An Upgrade Path&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The other really neat thing about mesh shaders is that they don’t require you to drastically rework
how your game engine handles geometry to take advantage of them. It looks like it should be pretty
easy to convert most common geometry types to mesh shaders, making it an approachable upgrade path for
developers.&lt;/p&gt;
&lt;p&gt;(You don’t have to convert &lt;em&gt;everything&lt;/em&gt; to mesh shaders straight away, though; it’s possible
to switch between the old geometry pipeline and the new mesh-shader-based one at different points in
the frame.)&lt;/p&gt;
&lt;p&gt;Suppose you have an ordinary authored mesh that you want to load and render. You’ll
need to break it up into meshlets, which have a static maximum size declared in the
shader—NVIDIA’s blog post recommends 64 vertices and 126 triangles as a default. How do we do this?&lt;/p&gt;
&lt;p&gt;Fortunately, most game engines currently do some form of &lt;a href=&#34;https://tomforsyth1000.github.io/papers/fast_vert_cache_opt.html&#34;&gt;vertex cache optimization&lt;/a&gt;,
which already organizes the primitives by locality—triangles sharing one or two vertices will tend
to be close together in the index buffer. So, a quite viable
strategy for creating meshlets is: just scan the index buffer linearly, accumulating the set of
vertices used, until you hit either 64 vertices or 126 triangles; reset and repeat until you’ve gone
through the whole mesh. This could be done at art build time, or it’s simple enough that you could even do it
in the engine at level load time.&lt;/p&gt;
&lt;p&gt;Alternatively, vertex cache optimization algorithms can probably be modified to produce meshlets directly.
For GPUs without mesh shader support, you can concatenate all the meshlet vertex buffers
together, and rapidly generate a traditional index buffer by offsetting and concatenating all the
meshlet index buffers. It’s pretty easy to go back and forth.&lt;/p&gt;
&lt;p&gt;In either case, the mesh shader would be mostly just acting as a vertex shader, with some extra
code to fetch vertex and index data from their buffers and plug them into the mesh outputs.&lt;/p&gt;
&lt;p&gt;What about other kinds of geometry found in games?&lt;/p&gt;
&lt;p&gt;Instanced draws are straightforward: multiply the meshlet count and put in a bit of
shader logic to hook up instance parameters. A more interesting case is multi-draw, where we want
to draw a lot of meshes that &lt;em&gt;aren’t&lt;/em&gt; all copies of the same thing. For this, we can employ
&lt;em&gt;task shaders&lt;/em&gt;—a secondary feature of the mesh shader pipeline. Task shaders
add an extra layer of compute-style work groups, running before the mesh shader, and they control
&lt;em&gt;how many&lt;/em&gt; mesh shader work groups to launch. They can also write output variables to be consumed by the
mesh shader. A very efficient multi-draw should be possible by launching task shaders with a thread
per draw, which in turn launch the mesh shaders for all the individual draws.&lt;/p&gt;
&lt;p&gt;If we need to draw a lot of &lt;em&gt;very&lt;/em&gt; small meshes, such as quads for particles/imposters/text/point-based rendering,
or boxes for occlusion tests / projected decals and whatnot, then we can pack a bunch of them
into each mesh shader workgroup. The geometry can be generated entirely in-shader rather than relying
on a pre-initialized index buffer from the CPU. (This was one of the original use cases that, it was
hoped, could be done with geometry shaders—e.g. submitting point primitives, and having the GS expand them
into quads.) There’s also a lot of flexibility to do stuff with variable topology, like particle
beams/strips/ribbons, which would otherwise need to be generated either on the CPU or in a separate
compute pre-pass.&lt;/p&gt;
&lt;p&gt;(By the way, the &lt;em&gt;other&lt;/em&gt; original use case that, it was hoped, could be done with geometry shaders
was multi-view rendering: drawing the same geometry to, say, multiple faces of a cubemap or slices
of a cascaded shadow map within a single draw call. You could do that with mesh shaders, too—but
Turing actually has a separate hardware multi-view capability for these applications.)&lt;/p&gt;
&lt;p&gt;What about tessellated meshes?&lt;/p&gt;
&lt;p&gt;The two-layer structure of task and mesh shaders is broadly
similar to that of tessellation hull and domain shaders. While it doesn’t appear that mesh shaders
have any kind of access to the fixed-function tessellator unit, it’s also
not too hard to imagine that we could write code in task/mesh shaders to reproduce tessellation
functionality (or at least some of it). Figuring out the details would be a bit of a research project
for sure—maybe someone has already worked on this?—and perf would be a question mark. However,
we’d get the benefit of being able to &lt;em&gt;change&lt;/em&gt; how tessellation works, instead of being stuck with
whatever Microsoft decided on in the late 2000s.&lt;/p&gt;
&lt;h2 id=&#34;new-possibilities&#34;&gt;&lt;a href=&#34;https://www.reedbeta.com/blog/mesh-shader-possibilities/#new-possibilities&#34; title=&#34;Permalink to this section&#34;&gt;New Possibilities&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;It’s great that mesh shaders can subsume our current geometry tasks, and in some cases make them
more efficient. But mesh shaders also open up possibilities for new kinds of geometry processing
that wouldn’t have been feasible on the GPU before, or would have required expensive compute
pre-passes storing data out to memory and then reading it back in through the traditional geometry
pipeline.&lt;/p&gt;
&lt;p&gt;With our meshes already in meshlet form, we can do &lt;a href=&#34;https://www.slideshare.net/gwihlidal/optimizing-the-graphics-pipeline-with-compute-gdc-2016&#34;&gt;finer-grained culling&lt;/a&gt;
at the meshlet level, and even at the triangle level within each meshlet. With task shaders, we can
potentially do mesh LOD selection on the GPU, and if we want to get fancy we could even try dynamically
packing together very small draws (from coarse LODs) to get better meshlet utilization.&lt;/p&gt;
&lt;p&gt;In place of tile-based forward lighting, or as an extension to it, it might be useful to cull
lights (and projected decals, etc.) per meshlet, assuming there’s a good way to pass the variable-size
light list from a mesh shader down to the fragment shader. (This suggestion from &lt;a href=&#34;https://twitter.com/sebaaltonen&#34;&gt;Seb Aaltonen&lt;/a&gt;.)&lt;/p&gt;
&lt;p&gt;Having access to the topology in the mesh shader should enable us to calculate dynamic normals,
tangents, and curvatures for a mesh that’s deforming due to complex skinning, displacement mapping,
or procedural vertex animation. We can also do voxel meshing, or isosurface extraction—marching
cubes or tetrahedra, plus generating normals etc. for the isosurface—directly in a mesh shader,
for rendering fluids and volumetric data.&lt;/p&gt;
&lt;p&gt;Geometry for hair/fur, foliage, or other surface cover might be feasible to generate on the fly,
with view-dependent detail.&lt;/p&gt;
&lt;p&gt;3D modeling and CAD apps may be able to apply mesh shaders to dynamically triangulate quad meshes or
n-gon meshes, as well as things like dynamically insetting/outsetting geometry for
visualizations.&lt;/p&gt;
&lt;p&gt;For rendering displacement-mapped terrain, water, and so forth, mesh shaders may be able to assist
us with &lt;a href=&#34;https://developer.nvidia.com/gpugems/GPUGems2/gpugems2_chapter02.html&#34;&gt;geometry clipmaps&lt;/a&gt;
and geomorphing; they might also be interesting for &lt;a href=&#34;http://hhoppe.com/proj/vdrpm/&#34;&gt;progressive meshing&lt;/a&gt;
schemes.&lt;/p&gt;
&lt;p&gt;And last but not least, we might be able to render &lt;a href=&#34;https://ia601908.us.archive.org/16/items/GDC2014Brainerd/GDC2014-Brainerd.pdf&#34;&gt;Catmull–Clark subdivision surfaces&lt;/a&gt;,
or other subdivision schemes, more easily and efficiently than it can be done on the GPU today.&lt;/p&gt;
&lt;p&gt;To be clear, a great deal of the above is speculation and handwaving on my part—I don’t want to
mislead you that all of these things are &lt;em&gt;for sure&lt;/em&gt; doable with the new mesh and task shader
pipeline. There will certainly be algorithmic difficulties and architectural hindrances that will
come up as graphics programmers have a chance to dig into this. Still, I’m quite excited to see what
people will do with this capability over the next few years, and I hope and expect that it won’t be
an NVIDIA-exclusive feature for too long.&lt;/p&gt;</description>
			</item>
	</channel>
</rss>