cs184-adu
In this project, I implemented the core routines of a path tracing rendering program. Path tracing is a powerful physically-based rendering technique that simulates the behavior of light in order to produce accurate global illumination and thus photorealistic images.
Given a geometric representation of a 3D scene and a camera position, path tracing works by casting rays through each pixel in the output image and averaging their values to calculate the final color of each pixel. The values of rays are a sum of direct and indirect lighting; whenever a ray intersects a surface, the direct lighting of that point is computed from casting shadow ray(s) to the light sources in the scene. Indirect lighting is computed by recursively casting a new ray from that hit point in a random direction.
This is one of the coolest projects I’ve ever done. Path tracing is a simple yet brilliant idea that draws on real-world physics (along with some liberties, namely reversing the flow of light) to produce stunning results. This project gave me insight into the core lighting techniques used in modern graphics and animation. I also have a new appreciation for the acceleration data structures and mathematical techniques developed to make path tracing computationally feasible.
We begin by filling in the core rendering function PathTracer::raytrace_pixel()
. The rendering portion of the program works by dividing up the scene into tiles, which are placed on a work queue processed by a user-specified number of threads. Each thread calls raytrace_pixel()
for each pixel inside the tile’s extent.
Given integer pixel coordinates and the desired number of samples per pixel, we cast rays through , where are randomly sampled from , by calling camera->generate_ray(...)
. If we are only sampling once per pixel, we simply choose the center of the pixel, i.e. . We then call est_radiance_global_illumination(...)
to get the radiance along the ray, then average the results from all samples to return a final Spectrum
(a RGB color value).
The camera has its own coordinate system so that the camera is at the origin and is looking along the axis. Given the camera’s field of view angles hFov
and vFov
(in radians), we define the sensor plane one unit along the view direction with its bottom left corner at and its top right corner at .
Camera::generate_ray()
takes in the point passed in from raytrace_pixel()
. The input is converted into camera coordinates such that maps to the bottom left and maps to the top right of the sensor plane.
The point is then converted into world space with the transformation matrix c2w
; this (normalized) vector becomes the direction of the ray while the origin is simply the camera’s position pos
.
At the core of ray tracing is ray-scene intersection. Information about intersections allows us to determine the visibility, shadows, lighting, and more of objects from the camera’s point of view.
A ray is parametrized by for .
As the basis of polygon meshes, the triangle is the most important primitive. Given a triangle with vertices , we set a point inside the triangle with barycentric coordinates . To calculate the values associated with an intersection, we set and solve the unknowns: .
The Möller–Trumbore algorithm is a fast method for this purpose. The solution is
where
A full derivation can be found here, but the main idea is simply an application of Cramer’s rule.
After performing the intersection test, we determine if it is valid; if t
lies outside the ray’s valid range of min_t
and max_t
values, it is invalid. Otherwise we update max_t
to be t
so that future intersections with farther away primitives will be discarded (this allows to solve the visibility problem). This information and more is stored inside the Intersection *isect
structure.
Lastly, we add support for the sphere primitive. A sphere centered at is given by the set of points that satisfy . Equating the sphere and ray equation gives
where
From here can recover the value(s) of with the quadratic formula, taking the closer of the two solutions if necessary. We perform similar checks to the validity of and update max_t
and the isect
appropriately.
At this point, we can now properly render scenes similar to those produced by rasterization. The screenshots below show the rendering of a few small dae files. Note that for now, the lighting is just a simple function of the surface normal vector.
When rendering the last two images shown above, we quickly notice a problem: any scene with more than a few dozen geometric primitives takes a very long time to render. This is because for each pixel, for each ray, we perform a linear scan of all primitives for possible intersections. This becomes prohibitively expensive for scenes with thousands or even millions of primitive mesh elements.
There are a variety of data structures that exploit spatial hierachy (e.g. k-d trees) that can help solve this problem. For this project, we use a popular acceleration structure for ray tracing: a bounding volume hierarchy (BVH). A BVH is a binary tree of geometric primitives; all primitives are wrapped in bounding volumes that form the leaf nodes of the tree.
PathTracer::build_accel()
is called as a pre-processing step before rendering. This method recursively builds the BVH from the bottom-up, using the following logic:
BVHNode
with that bounding box.max_leaf_size
primitives in the list, this is a leaf node; allocate the node’s primitive list and return the node.Any arbitrary splitting method would work, but the left and right subtrees should be as non-overlapping in space as possible for maximum efficiency. Splitting on the average of the centroids is a simple and easy-to-compute heuristic and works well for all but a few pathological cases.
In the unlikely event of all primitives being sorted into one subtree (i.e. all centroid axis values are exactly the same), we simply divide the list in two in order to prevent infinite recursion.
BBox
To aid with the intersection algorithm, we first implement BBox::intersect()
which returns an interval of t
values for which a given ray intersects the bounding box. The idea of the implementation is to compute the intersections with the planes separately and take the intersection of the corresponding t_min
and t_max
values.
The ray-plane intersection solution is particularly simple for axis-aligned planes; if is a point on the -plane, then the intersection occurs at
(with similar results for the and planes). We repeat the above calculation six times; once for each planes for each minimum and maximum value. The intersection of their intervals gives the final result.
BVHAccel
We now implement BVHAccel::intersect()
to take advantage of the BVH structure. It is a recursive function initially called on the root from other parts of the code. If it is a leaf node, we loop over all primitives and check for intersections using the primitive’s intersect method defined earlier. Otherwise, it is an internal node. If the ray does not intersect the bounding box of the node, we return immediately. This is the key behind the BVH’s speed - we can discard nearly half of all primitives at each level if the ray does not touch that half of the bounding box. If the ray does intersect the bounding box of the node, we recursively call intersect()
on the left and right subtrees.
As an implementation detail, I struggled with debugging this method for a while. My error came from short-circuiting when I encountered an intersection in the left subtree - it is still critical to explore the right subtree because the bounding boxes may overlap and the ray may pass through both subtrees.
Below is a visualization of the BVH for the cow dae file. Starting from the root of the BVH, we show six levels of the tree, traversed by following the left or right pointers from each node.
We see that each level of the BVH roughly divides the scene into two halves, allowing a ray to quickly “zero-in” on the relevant part of the scene.
We can now efficiently render dae files with tens or hundreds of thousands of primitives:
Below is a table comparing rendering performance before and after adding the BVH code. The non-BVH accelerated performance (i.e. Part 1 code) is in red while the BVH accelerated performance is in blue.
File | Primitives | Avg Intersection tests per ray | Render speed |
---|---|---|---|
CBspheres_lambertian.dae |
14 | 9.9 / 8.2 | 0.029s / 0.030s |
cow.dae |
5,856 | 834 / 3.29 | 17.810s / 0.003s |
dragon.dae |
10,012 | 88,337 / 7.39 | 699.2s / 0.1839s |
Note that the intersection test counter was found to not be thread-safe, leading to erroneous values. It is given here only as a point of comparison.
The results show that a BVH even with simple hueristics can make a drastic improvement in rendering time, taking intersection complexity from linear to log. We see that the BVH provides little benefit for CBspheres_lambertian.dae
since there are only 14 primitives, for which a linear scan is acceptable. However, files with more complicated geometries (i.e. more primitives) quickly lead to an intractable number of intersection tests per ray and consequently have long rendering times. The BVH structure keeps intersection tests to a minimum via efficient binary lookup of the relevant intersection part of the scene. Although the BVH construction step adds some additional pre-processing complexity, it is negligible compared to the rendering time of the non-optimized code. Even the most complicated files shown above were rendered in under a second.
From this point forward we begin to model more realistic shading. In this part, we add direct illumination estimation by casting one or more shadow rays from each ray-surface intersection point. A point is only illuminated if its shadow rays intersect a light source directly.
At each ray-surface intersection, we need a way of modeling the distribution of light in order to accurately accumulate it. The idea is to weight the illuminance arriving at a single point by a surface reflectance function (called the bidirectional reflectance distribution function, or BRDF) to determine how much of it will go towards a particular direction.
The BRDF is a function of incoming light direction and outgoing direction that returns the ratio of reflected radiance exiting along to the irradiance incident on the surface from direction .
In this part of the project we only support Lambertian surfaces, which are perfectly diffuse surfaces. The reflected luminance is equal in all directions lying in the half-space from the surface; see the diagram below (source).
The diffuse BSDF is thus constant, given by
where is the albedo, ranging from 0 (total absorption) to 1 (total reflection). The is a normalizing factor arising from the fact that .
With one material at our disposal, we can start direct lighting estimations. PathTracer::estimate_direct_lighting_hemisphere()
is called by one_bounce_radiance()
to get an estimate of the direct lighting at a point that a ray hits. An overview of the implementation is as follows:
num_samples
times:
wi
in the object space hemisphere using hemisphereSampler()
.wi
into world space wi_world
using the o2w
transformation matrix.intersect
struct and shadow ray with origin at the hit point and direction wi_world
. Note that we add a small offset EPS_D * wi_world
to the ray’s origin for floating point precision purposes.intersect.bsdf->get_emission()
. We then accumulate the light by weighting this emission by the current surface intersection’s bsdf function as well as the cosine of the unit vector wi
.num_samples
.As an implementation detail, I spent a long time debugging before realizing I was using the wrong direction vector in the cosine weighting term - we need to use the object space unit vector.
We can now render scenes with Lambertian diffuse BSDFs with direct lighting. We get nice effects such as area light shadows and ambient occlusion.
File: CBspheres_lambertian.dae Camera rays per pixel: 16 Samples per area light: 8 |
File: CBspheres_lambertian.dae Camera rays per pixel: 64 Samples per area light: 32 |
Note that in this context, the samples per area light controls the num_samples
.
While uniform hemisphere sampling works, it takes a high number of samples to converge. The shadow ray casting procedure is inefficient in that most of the rays will not directly intersect a light source and thus contribute nothing. We can do better by sampling the light sources directly.
An overview of the implementation of PathTracer::estimate_direct_lighting_importance()
is as follows:
For each SceneLight sl
in the scene:
ns_area_light
samples in a loop.SceneLight::sample_L()
which takes in the hit point and returns the incoming radiance. It also calculates wi
, a probabilistically sampled unit vector giving the direction from the hit point to the light source; the distance to the light source; and a pdf
float giving the pdf evaluated at wi
.wi
is negative, we continue as the sampled light lies behind the surface.wi
(toward the light) with max_t
equal to the distance to the light, since we ignore intersections behind the light. Note that we again add a slight offset of EPS_D * wi
to the origin floating point precision purposes.wi
in object space. We also divide by the pdf to account for the light sampling distribution which may bias the sample towards certain parts of the light.We can now render scenes with less noise and handle lighting from point light sources. Shown below are renders of scenes with point light sources using importance sampling with 64 rays per pixel.
File: dragon.dae Camera rays per pixel: 64 Samples per area light: 32 |
File: wall-e.dae Camera rays per pixel: 64 Samples per area light: 32 |
Below we compare the noise levels in soft shadows in CBbunny.dae
for various numbers of light rays with 1 ray per pixel.
Samples per area light: 1 | Samples per area light: 4 |
Samples per area light: 16 | Samples per area light: 64 |
It can be shown that variance decrease linearly with the number of samples.
We now directly compare the results of the two lighting functions side-by-side for the same number of samples.
Uniform Hemisphere Sampling | Importance Sampling |
Camera rays per pixel: 16 Samples per area light: 8 |
Camera rays per pixel: 16 Samples per area light: 8 |
Camera rays per pixel: 64 Samples per area light: 32 |
Camera rays per pixel: 64 Samples per area light: 32 |
Camera rays per pixel: 128 Samples per area light: 64 |
Camera rays per pixel: 128 Samples per area light: 64 |
We see that importance sampling converges much faster than uniform hemisphere sampling for the same parameters. The noise in uniform hemisphere sampling is due to the fact that only a small portion of the rays cast from the hit point actually intersect a light source directly, thus leading to significant dark patches. In contrast, in importance sampling we only consider those rays that contribute to the illumination of the surface point. This results in much smoother renderings for the same computational cost.
The real magic of path tracing comes from its ability to simulate full global illumination. We add support for indirect lighting, which is the light reflected from other surfaces.
The function est_radiance_global_illumination()
first calls zero_bounce_radiance()
to get the light from light sources that intersects the camera directly, using isect.bsdf->get_emission()
. It then accumulates the light from at_least_one_bounce_radiance()
, which is the main implementation work. This function works as follows:
one_bounce_radiance()
, which is simply the importance sampling lighting function from part 3. This gives us the direct lighting at the point.max_ray_depth
) is less than 1.isect.bsdf->sample_f()
. Given the outgoing radiance direction w_out
, this function returns the BSDF value as a Spectrum
and computes the probabilistically sampled w_in
unit vector giving the incoming radiance direction and a pdf
float giving the probability density function evaluated at w_in
.w_in
(converted to world space coordinates) with the origin at the hit point. As usual, the ray’s origin is offset by EPS_D
times the ray’s direction. We also make sure to decrement the ray’s depth by 1.bvh
, we obtain the radiance along the ray by recursively calling at_least_one_bounce_radiance()
on the new ray. We then accumulate this indirect light by weighting it by the current surface intersection’s bsdf function as well as the cosine of the unit vector w_in
. We additionally divide the pdf
and Russian roulette continuation probability.The algorithm is fairly intuitive, but the Russian roulette step may seem odd. Obviously we cannot allow for infinite bounces of light, but it turns out that setting a maximum depth leads to a biased estimate. Russian roulette provides an unbiased estimate by culling some of the paths and sharing their contribution among the other paths. Here, we use it in combination with a max ray depth to get a reasonably accurate but practical estimate of global illumination.
Below is a rendering of bunny.dae
with global illumination using 1024 samples per pixel, 4 samples per area light, and a max ray depth of 5.
Observe the nice ambient color bleeding from the walls on the surface of the bunny. Another rendering of wall-e.dae
with same parameters is shown below.
Below is a rendering of CBspheres_lambertian.dae
with ONLY direct lighting (left) and ONLY indirect lighting (right). Both renders are using 1024 samples per pixel, 4 samples per area light, and a max ray depth of 5.
Parts of the scene only lit by indirect bounces of light, such as the ceiling and undersides of the spheres, appear in the indirect lighting algorithm.
Below we compare rendered views of CBbunny.dae
with different max_ray_depth
values. Each rendering uses 1024 samples per pixel and 4 samples per area light.
max_ray_depth : 0 |
max_ray_depth : 1 |
max_ray_depth : 2 |
max_ray_depth : 3 |
max_ray_depth : 4 |
max_ray_depth : 100 |
The biggest change occurs from max_ray_depth
of zero (which is essentially direct lighting) to one. There scene converges after 3 bounces, at which point increasing the max_ray_depth
further has no discernible effect on the result.
Below we compare rendered views of CBspheres_lambertian.dae
with various sample-per-pixel rates, all with 4 samples per area light and a max ray depth of 5.
Camera rays per pixel: 1 | Camera rays per pixel: 2 |
Camera rays per pixel: 4 | Camera rays per pixel: 8 |
Camera rays per pixel: 16 | Camera rays per pixel: 64 |
Camera rays per pixel: 1024 |
We see that global illumination produces great results, but requires a high number of samples per pixel to get the noise under control. The 800x600 resolution renders using 1024 rays per pixel take about 30 minutes on the instructional machines.
Not every pixel requires a large number of samples to eliminate noise. Certain parts of the scene converge faster with low sampling rates while other parts may require thousands of samples. Adaptive sampling is an optimization technique that avoids using a fixed high number of samples per pixel by concentrating the work in the more difficult parts of the image.
We implement a very simple algorithm that uses statistics to indicate whether a pixel has converged. Given samples with a mean of and standard deviation of , we quanitfy the convergence with
If , where maxTolerance=0.05
by default, we assume that the pixel has converged and stop tracing additional rays for the pixel.
We implement this approach in PathTracer::raytrace_pixel()
. Within the loop, we use each sample’s illuminance to accumulate two variables:
The mean and variance for all samples so far is thus given by
Every samplesPerBatch
iterations of the loop, we update estimates of and and check for convergence to see if we can terminate the loop early. We track the number of samples used in a buffer to generate a rate image at the end.
As an implementation detail, I spent some time debugging this function. It turned out I was incorectly averaging samples by dividing by num_samples
regardless of whether the loop terminated early due to adaptive sampling.
Below is a view of CBspheres_lambertian.dae
with 2048 samples per pixel, 1 sample per area light and a max ray depth of 5.
Below is “heat map” of the image; red and blue colors represent high and low sampling rates, respectively. Sampling rates are computed as the ratio between the actual number of samples calculated for a given pixel and the maximum number of samples allowed.
We see that lighter areas in the scene like the ceiling light and floor converged relatively quickly while darker areas such as the sphere’s surface facing the camera took longer.
Below is another rendering of CBbunny.dae
with the same parameters using adaptive sampling. The rate image is shown beneath it.
Adaptive sampling thus provides noise-free images with running time strictly shorter than rendering without it. Convergence still takes a considerable amount of time, however. The two images above took about 20 minutes on the instructional machines.