Project 3-1: Pathtracer

CS 184: Computer Graphics and Imaging, Spring 2019

Andrew Campbell, `cs184-adu`

Overview

In this project, I implemented the core routines of a path tracing rendering program. Path tracing is a powerful physically-based rendering technique that simulates the behavior of light in order to produce accurate global illumination and thus photorealistic images.

Given a geometric representation of a 3D scene and a camera position, path tracing works by casting rays through each pixel in the output image and averaging their values to calculate the final color of each pixel. The values of rays are a sum of direct and indirect lighting; whenever a ray intersects a surface, the direct lighting of that point is computed from casting shadow ray(s) to the light sources in the scene. Indirect lighting is computed by recursively casting a new ray from that hit point in a random direction.

This is one of the coolest projects I’ve ever done. Path tracing is a simple yet brilliant idea that draws on real-world physics (along with some liberties, namely reversing the flow of light) to produce stunning results. This project gave me insight into the core lighting techniques used in modern graphics and animation. I also have a new appreciation for the acceleration data structures and mathematical techniques developed to make path tracing computationally feasible.

Part 1: Ray Generation and Scene Intersection

Image source

Filling in the sample loop

We begin by filling in the core rendering function PathTracer::raytrace_pixel(). The rendering portion of the program works by dividing up the scene into tiles, which are placed on a work queue processed by a user-specified number of threads. Each thread calls raytrace_pixel() for each pixel inside the tile’s extent.

Given integer pixel coordinates $(x,y)$ and the desired number of samples per pixel, we cast rays through $(x+s, y+t)$ , where $(s,t)$ are randomly sampled from $[0,1]^2$ , by calling camera->generate_ray(...). If we are only sampling once per pixel, we simply choose the center of the pixel, i.e. $(x+0.5, y+0.5)$ . We then call est_radiance_global_illumination(...) to get the radiance along the ray, then average the results from all samples to return a final Spectrum (a RGB color value).

Generating camera rays

The camera has its own coordinate system so that the camera is at the origin and is looking along the $-z$ axis. Given the camera’s field of view angles hFov and vFov (in radians), we define the sensor plane one unit along the view direction with its bottom left corner at $(-\tan(\texttt{hFov}/2), -\tan(\texttt{vFov}/2), -1)$ and its top right corner at $(\tan(\texttt{hFov}/2), -\tan(\texttt{vFov}/2), -1)$ .

Camera::generate_ray() takes in the point passed in from raytrace_pixel(). The input is converted into camera coordinates such that $(0,0)$ maps to the bottom left and $(1,1)$ maps to the top right of the sensor plane.

The point is then converted into world space with the transformation matrix c2w; this (normalized) vector becomes the direction of the ray while the origin is simply the camera’s position pos.

Intersecting Triangles

At the core of ray tracing is ray-scene intersection. Information about intersections allows us to determine the visibility, shadows, lighting, and more of objects from the camera’s point of view.

A ray is parametrized by $\mathbb{r}(t) = \mathbf{O} + t\mathbf{D}$ for $0\leq t \leq \infty$ .

As the basis of polygon meshes, the triangle is the most important primitive. Given a triangle with vertices $\mathbf{P_0}, \mathbf{P_1}, \mathbf{P_2}$ , we set a point $\mathbf{P}$ inside the triangle with barycentric coordinates $\mathbf{P}=(1-b_1-b_2)\mathbf{P_0} + b_1\mathbf{P_1} + b_2 \mathbf{P_2}$ . To calculate the values associated with an intersection, we set $\mathbb{r}(t) = \mathbf{P}$ and solve the unknowns: $t, b_1, b_2$ .

The Möller–Trumbore algorithm is a fast method for this purpose. The solution is

$\begin{bmatrix} t \\ b_1 \\ b_2 \end{bmatrix} = \frac{1}{\mathbf{S_1} \cdot \mathbf{E_1}} \begin{bmatrix} \mathbf{S_2} \cdot \mathbf{E_2} \\ \mathbf{S_1} \cdot \mathbf{S} \\ \mathbf{S_2} \cdot \mathbf{D} \end{bmatrix}$

where

$\begin{align*} \mathbf{E_1} &= \mathbf{P_1} - \mathbf{P_0} \\ \mathbf{E_2} &= \mathbf{P_2} - \mathbf{P_0} \\ \mathbf{S} &= \mathbf{O} - \mathbf{P_0} \\ \mathbf{S_1} &= \mathbf{D} \times \mathbf{E_2} \\ \mathbf{S_2} &= \mathbf{S} \times \mathbf{E_1} \end{align*}$

A full derivation can be found here, but the main idea is simply an application of Cramer’s rule.

After performing the intersection test, we determine if it is valid; if t lies outside the ray’s valid range of min_t and max_t values, it is invalid. Otherwise we update max_t to be t so that future intersections with farther away primitives will be discarded (this allows to solve the visibility problem). This information and more is stored inside the Intersection *isect structure.

Intersecting Spheres

Lastly, we add support for the sphere primitive. A sphere centered at $\mathbf{C}$ is given by the set of points $\mathbf{P}$ that satisfy $(\mathbf{P}-\mathbf{C})^\top(\mathbf{P}-\mathbf{C}) - R^2=0$ . Equating the sphere and ray equation gives

$at^2 + bt + c = 0$

where

$\begin{align*} a &= \mathbf{D} \cdot \mathbf{D} \\ b &= 2(\mathbf{O} - \mathbf{C}) \cdot \mathbf{D} \\ c &= (\mathbf{O} - \mathbf{C}) \cdot (\mathbf{O} - \mathbf{C}) - R^2 \end{align*}$

From here can recover the value(s) of $t$ with the quadratic formula, taking the closer of the two solutions if necessary. We perform similar checks to the validity of $t$ and update max_t and the isect appropriately.

At this point, we can now properly render scenes similar to those produced by rasterization. The screenshots below show the rendering of a few small dae files. Note that for now, the lighting is just a simple function of the surface normal vector.

Part 2: Bounding Volume Hierarchy

When rendering the last two images shown above, we quickly notice a problem: any scene with more than a few dozen geometric primitives takes a very long time to render. This is because for each pixel, for each ray, we perform a linear scan of all primitives for possible intersections. This becomes prohibitively expensive for scenes with thousands or even millions of primitive mesh elements.

There are a variety of data structures that exploit spatial hierachy (e.g. k-d trees) that can help solve this problem. For this project, we use a popular acceleration structure for ray tracing: a bounding volume hierarchy (BVH). A BVH is a binary tree of geometric primitives; all primitives are wrapped in bounding volumes that form the leaf nodes of the tree.

Constructing the BVH

PathTracer::build_accel() is called as a pre-processing step before rendering. This method recursively builds the BVH from the bottom-up, using the following logic:

We compute the bounding box of all the primitives.
We initialize a new BVHNode with that bounding box.
If there are no more than max_leaf_size primitives in the list, this is a leaf node; allocate the node’s primitive list and return the node.
Otherwise, this is an internal node. We choose the largest dimension of the bounding box’s extent as the axis to split on. We then split the primitives into two lists, depending on whether their bounding box’s centroid’s coordinate in the chosen axis is less than or greater than the average of the centroids of the primitives. Then we recursively assign the left and right subtrees on the two primitive lists just generated.

Any arbitrary splitting method would work, but the left and right subtrees should be as non-overlapping in space as possible for maximum efficiency. Splitting on the average of the centroids is a simple and easy-to-compute heuristic and works well for all but a few pathological cases.

In the unlikely event of all primitives being sorted into one subtree (i.e. all centroid axis values are exactly the same), we simply divide the list in two in order to prevent infinite recursion.

Intersecting `BBox`

To aid with the intersection algorithm, we first implement BBox::intersect() which returns an interval of t values for which a given ray intersects the bounding box. The idea of the implementation is to compute the intersections with the $x,y,z$ planes separately and take the intersection of the corresponding t_min and t_max values.

The ray-plane intersection solution is particularly simple for axis-aligned planes; if $\mathbf{P'}$ is a point on the $x$ -plane, then the intersection occurs at

$t = \frac{\mathbf{P'}_x - \mathbf{O}_x}{\mathbf{D}_x}$

(with similar results for the $y$ and $z$ planes). We repeat the above calculation six times; once for each $x,y,z$ planes for each $x,y,z$ minimum and maximum value. The intersection of their intervals gives the final result.

Intersecting `BVHAccel`

We now implement BVHAccel::intersect() to take advantage of the BVH structure. It is a recursive function initially called on the root from other parts of the code. If it is a leaf node, we loop over all primitives and check for intersections using the primitive’s intersect method defined earlier. Otherwise, it is an internal node. If the ray does not intersect the bounding box of the node, we return immediately. This is the key behind the BVH’s speed - we can discard nearly half of all primitives at each level if the ray does not touch that half of the bounding box. If the ray does intersect the bounding box of the node, we recursively call intersect() on the left and right subtrees.

As an implementation detail, I struggled with debugging this method for a while. My error came from short-circuiting when I encountered an intersection in the left subtree - it is still critical to explore the right subtree because the bounding boxes may overlap and the ray may pass through both subtrees.

Below is a visualization of the BVH for the cow dae file. Starting from the root of the BVH, we show six levels of the tree, traversed by following the left or right pointers from each node.

We see that each level of the BVH roughly divides the scene into two halves, allowing a ray to quickly “zero-in” on the relevant part of the scene.

We can now efficiently render dae files with tens or hundreds of thousands of primitives:

Below is a table comparing rendering performance before and after adding the BVH code. The non-BVH accelerated performance (i.e. Part 1 code) is in red while the BVH accelerated performance is in blue.

File	Primitives	Avg Intersection tests per ray	Render speed
`CBspheres_lambertian.dae`	14	9.9 / 8.2	0.029s / 0.030s
`cow.dae`	5,856	834 / 3.29	17.810s / 0.003s
`dragon.dae`	10,012	88,337 / 7.39	699.2s / 0.1839s

Note that the intersection test counter was found to not be thread-safe, leading to erroneous values. It is given here only as a point of comparison.

The results show that a BVH even with simple hueristics can make a drastic improvement in rendering time, taking intersection complexity from linear to log. We see that the BVH provides little benefit for CBspheres_lambertian.dae since there are only 14 primitives, for which a linear scan is acceptable. However, files with more complicated geometries (i.e. more primitives) quickly lead to an intractable number of intersection tests per ray and consequently have long rendering times. The BVH structure keeps intersection tests to a minimum via efficient binary lookup of the relevant intersection part of the scene. Although the BVH construction step adds some additional pre-processing complexity, it is negligible compared to the rendering time of the non-optimized code. Even the most complicated files shown above were rendered in under a second.

Part 3: Direct Illumination

From this point forward we begin to model more realistic shading. In this part, we add direct illumination estimation by casting one or more shadow rays from each ray-surface intersection point. A point is only illuminated if its shadow rays intersect a light source directly.

Diffuse BSDF

At each ray-surface intersection, we need a way of modeling the distribution of light in order to accurately accumulate it. The idea is to weight the illuminance arriving at a single point by a surface reflectance function (called the bidirectional reflectance distribution function, or BRDF) to determine how much of it will go towards a particular direction.

The BRDF is a function $f_{\text{r}}(\omega_{\text{i}}, \omega_{\text{r}})$ of incoming light direction $\omega_{\text{i}}$ and outgoing direction $\omega_{\text{r}}$ that returns the ratio of reflected radiance exiting along $\omega_{\text{r}}$ to the irradiance incident on the surface from direction $\omega _{\text{i}}$ .

In this part of the project we only support Lambertian surfaces, which are perfectly diffuse surfaces. The reflected luminance is equal in all directions lying in the half-space from the surface; see the diagram below (source).

The diffuse BSDF is thus constant, given by

$f_\text{r}(\omega_{\text{i}}, \omega_{\text{r}}) = \frac{\rho}{\pi}$

where $\rho$ is the albedo, ranging from 0 (total absorption) to 1 (total reflection). The $1/\pi$ is a normalizing factor arising from the fact that $\int_{0}^{2\pi}\int_{0}^{\frac{\pi}{2}}\cos{\theta}\sin{\phi}\,d\phi\,d\theta = \pi$ .

Uniform hemisphere sampling

With one material at our disposal, we can start direct lighting estimations. PathTracer::estimate_direct_lighting_hemisphere() is called by one_bounce_radiance() to get an estimate of the direct lighting at a point that a ray hits. An overview of the implementation is as follows:

We use the input intersection normal to calculate a local coordinate space (“object space”) such that the normal to the surface is at $(0,0,1)$ - these are the coordinates expected by the BSDF functions.
Repeat num_samples times:
- Sample a random unit vector wi in the object space hemisphere using hemisphereSampler().
- Convert the wi into world space wi_world using the o2w transformation matrix.
- Create a new intersect struct and shadow ray with origin at the hit point and direction wi_world. Note that we add a small offset EPS_D * wi_world to the ray’s origin for floating point precision purposes.
- If the shadow ray intersects anything, we get the reflected emission along that ray via intersect.bsdf->get_emission(). We then accumulate the light by weighting this emission by the current surface intersection’s bsdf function as well as the cosine of the unit vector wi.
We divide by the pdf of the sampling scheme (which is just $1/2\pi$ for uniform hemisphere) and finally return the average light by dividing by num_samples.

As an implementation detail, I spent a long time debugging before realizing I was using the wrong direction vector in the cosine weighting term - we need to use the object space unit vector.

We can now render scenes with Lambertian diffuse BSDFs with direct lighting. We get nice effects such as area light shadows and ambient occlusion.


File: `CBspheres_lambertian.dae` Camera rays per pixel: 16 Samples per area light: 8	File: `CBspheres_lambertian.dae` Camera rays per pixel: 64 Samples per area light: 32

Note that in this context, the samples per area light controls the num_samples.

Importance sampling

While uniform hemisphere sampling works, it takes a high number of samples to converge. The shadow ray casting procedure is inefficient in that most of the rays will not directly intersect a light source and thus contribute nothing. We can do better by sampling the light sources directly.

An overview of the implementation of PathTracer::estimate_direct_lighting_importance() is as follows:

For each SceneLight sl in the scene:

If the light is a delta light (i.e. a point source), sample once. Otherwise, retrieve ns_area_light samples in a loop.
For each sample:
- To get the sample, we call SceneLight::sample_L() which takes in the hit point and returns the incoming radiance. It also calculates wi, a probabilistically sampled unit vector giving the direction from the hit point to the light source; the distance to the light source; and a pdf float giving the pdf evaluated at wi.
- If the $z$ coordinate of wi is negative, we continue as the sampled light lies behind the surface.
- We create a new shadow ray with origin at the hit point and direction wi (toward the light) with max_t equal to the distance to the light, since we ignore intersections behind the light. Note that we again add a slight offset of EPS_D * wi to the origin floating point precision purposes.
- If the shadow ray doensn’t intersect anything (i.e. there is an unobsructed path to the light), we accumulate the light by weighting the light sample by the current surface intersection’s bsdf function as well as the cosine of the unit vector wi in object space. We also divide by the pdf to account for the light sampling distribution which may bias the sample towards certain parts of the light.
We average the light appropriately by dividing by the number of light samples taken.

We can now render scenes with less noise and handle lighting from point light sources. Shown below are renders of scenes with point light sources using importance sampling with 64 rays per pixel.


File: `dragon.dae` Camera rays per pixel: 64 Samples per area light: 32	File: `wall-e.dae` Camera rays per pixel: 64 Samples per area light: 32

Below we compare the noise levels in soft shadows in CBbunny.dae for various numbers of light rays with 1 ray per pixel.


Samples per area light: 1	Samples per area light: 4

Samples per area light: 16	Samples per area light: 64

It can be shown that variance decrease linearly with the number of samples.

We now directly compare the results of the two lighting functions side-by-side for the same number of samples.

Uniform Hemisphere Sampling	Importance Sampling

Camera rays per pixel: 16 Samples per area light: 8	Camera rays per pixel: 16 Samples per area light: 8

Camera rays per pixel: 64 Samples per area light: 32	Camera rays per pixel: 64 Samples per area light: 32

Camera rays per pixel: 128 Samples per area light: 64	Camera rays per pixel: 128 Samples per area light: 64

We see that importance sampling converges much faster than uniform hemisphere sampling for the same parameters. The noise in uniform hemisphere sampling is due to the fact that only a small portion of the rays cast from the hit point actually intersect a light source directly, thus leading to significant dark patches. In contrast, in importance sampling we only consider those rays that contribute to the illumination of the surface point. This results in much smoother renderings for the same computational cost.

Part 4: Global Illumination

The real magic of path tracing comes from its ability to simulate full global illumination. We add support for indirect lighting, which is the light reflected from other surfaces.

The function est_radiance_global_illumination() first calls zero_bounce_radiance() to get the light from light sources that intersects the camera directly, using isect.bsdf->get_emission(). It then accumulates the light from at_least_one_bounce_radiance(), which is the main implementation work. This function works as follows:

We initialize the illumination by calling one_bounce_radiance(), which is simply the importance sampling lighting function from part 3. This gives us the direct lighting at the point.
We terminate if the ray’s depth (originally max_ray_depth) is less than 1.
We also terminate with some probability according to Russian roulette; I used a continuation probability of $0.6$ .
Otherwise, we continue with casting a new ray from the intersection point. We take a sample from the surface BSDF at the intersection point via isect.bsdf->sample_f(). Given the outgoing radiance direction w_out, this function returns the BSDF value as a Spectrum and computes the probabilistically sampled w_in unit vector giving the incoming radiance direction and a pdf float giving the probability density function evaluated at w_in.
The new ray is in the direction of w_in (converted to world space coordinates) with the origin at the hit point. As usual, the ray’s origin is offset by EPS_D times the ray’s direction. We also make sure to decrement the ray’s depth by 1.
If the new ray intersects the bvh, we obtain the radiance along the ray by recursively calling at_least_one_bounce_radiance() on the new ray. We then accumulate this indirect light by weighting it by the current surface intersection’s bsdf function as well as the cosine of the unit vector w_in. We additionally divide the pdf and Russian roulette continuation probability.

The algorithm is fairly intuitive, but the Russian roulette step may seem odd. Obviously we cannot allow for infinite bounces of light, but it turns out that setting a maximum depth leads to a biased estimate. Russian roulette provides an unbiased estimate by culling some of the paths and sharing their contribution among the other paths. Here, we use it in combination with a max ray depth to get a reasonably accurate but practical estimate of global illumination.

Below is a rendering of bunny.dae with global illumination using 1024 samples per pixel, 4 samples per area light, and a max ray depth of 5.

Observe the nice ambient color bleeding from the walls on the surface of the bunny. Another rendering of wall-e.dae with same parameters is shown below.

Below is a rendering of CBspheres_lambertian.dae with ONLY direct lighting (left) and ONLY indirect lighting (right). Both renders are using 1024 samples per pixel, 4 samples per area light, and a max ray depth of 5.

Parts of the scene only lit by indirect bounces of light, such as the ceiling and undersides of the spheres, appear in the indirect lighting algorithm.

Below we compare rendered views of CBbunny.dae with different max_ray_depth values. Each rendering uses 1024 samples per pixel and 4 samples per area light.


`max_ray_depth`: 0	`max_ray_depth`: 1

`max_ray_depth`: 2	`max_ray_depth`: 3

`max_ray_depth`: 4	`max_ray_depth`: 100

The biggest change occurs from max_ray_depth of zero (which is essentially direct lighting) to one. There scene converges after 3 bounces, at which point increasing the max_ray_depth further has no discernible effect on the result.

Below we compare rendered views of CBspheres_lambertian.dae with various sample-per-pixel rates, all with 4 samples per area light and a max ray depth of 5.


Camera rays per pixel: 1	Camera rays per pixel: 2

Camera rays per pixel: 4	Camera rays per pixel: 8

Camera rays per pixel: 16	Camera rays per pixel: 64

Camera rays per pixel: 1024

We see that global illumination produces great results, but requires a high number of samples per pixel to get the noise under control. The 800x600 resolution renders using 1024 rays per pixel take about 30 minutes on the instructional machines.

Part 5: Adaptive Sampling

Not every pixel requires a large number of samples to eliminate noise. Certain parts of the scene converge faster with low sampling rates while other parts may require thousands of samples. Adaptive sampling is an optimization technique that avoids using a fixed high number of samples per pixel by concentrating the work in the more difficult parts of the image.

We implement a very simple algorithm that uses statistics to indicate whether a pixel has converged. Given $n$ samples with a mean of $\mu$ and standard deviation of $\sigma$ , we quanitfy the convergence with

$I = 1.96 \cdot \frac{\sigma}{\sqrt{n}}$

If $I < \texttt{maxTolerance} \cdot \mu$ , where maxTolerance=0.05 by default, we assume that the pixel has converged and stop tracing additional rays for the pixel.

We implement this approach in PathTracer::raytrace_pixel(). Within the loop, we use each sample’s illuminance $x_k$ to accumulate two variables:

$s_1 = \sum_{k=1}^{n} x_k, \\ s_2 = \sum_{k=1}^{n} x_k^2$

The mean and variance for all $n$ samples so far is thus given by

$\mu = \frac{s_1}{n} \\ \sigma^2 = \frac{1}{n-1} \cdot \left( s_2 - \frac{s_1^2}{n} \right)$

Every samplesPerBatch iterations of the loop, we update estimates of $\mu$ and $\sigma$ and check for convergence to see if we can terminate the loop early. We track the number of samples used in a buffer to generate a rate image at the end.

As an implementation detail, I spent some time debugging this function. It turned out I was incorectly averaging samples by dividing by num_samples regardless of whether the loop terminated early due to adaptive sampling.

Below is a view of CBspheres_lambertian.dae with 2048 samples per pixel, 1 sample per area light and a max ray depth of 5.

Below is “heat map” of the image; red and blue colors represent high and low sampling rates, respectively. Sampling rates are computed as the ratio between the actual number of samples calculated for a given pixel and the maximum number of samples allowed.

We see that lighter areas in the scene like the ceiling light and floor converged relatively quickly while darker areas such as the sphere’s surface facing the camera took longer.

Below is another rendering of CBbunny.dae with the same parameters using adaptive sampling. The rate image is shown beneath it.

Adaptive sampling thus provides noise-free images with running time strictly shorter than rendering without it. Convergence still takes a considerable amount of time, however. The two images above took about 20 minutes on the instructional machines.