Interactive visual computing using GPUs

Graphic depicting Prof. Rui Wang's research

August 20, 2013

Advances in visual computing, particularly in computer-generated imagery, have profoundly changed the way we express ideas, create content, exchange information, and interact with machines. Today, visual computing is an indispensable component of many fields, including design, prototyping, data analysis, medical imaging, digital preservation, training, e-commerce, and education. Despite the tremendous progress in recent years, generating convincing imagery at interactive rates remains a major challenge in graphics. For example, in the physical world, trillions of photons can simultaneously interact with the scene, leading to an equilibrium state instantly; in the digital world, however, we have to simulate such complex interactions with very limited processing power. As a result, generating a photorealistic image often takes minutes to hours, severely limiting the user's productivity.

Associate Professor Rui Wang's research aims to enable interactive visual computing by exploiting modern graphics processors (GPUs). Today's GPUs have emerged as low-cost, massively parallel computation platforms with thousands of cores, high computation speed and memory bandwidth, often orders of magnitude higher than the CPU counterpart. Harnessing the GPU's parallel processing capability can provide an economic solution to tackle computationally expensive tasks. However, making full use of the GPU's potentials is a non-trivial problem. "One challenge is that many of our algorithms are not naturally expressed in parallel steps," says Wang. "For example, the simple problem of finding the maximum value in a large set of elements usually requires sequentially comparing every element with a temporary maximum. Since every comparison depends on the outcome of the previous one, the algorithm as is does not allow sharing the workload among multiple processors.

Another challenge is that the best-known algorithms for solving a problem on the CPU are often not optimal on the GPU. For example, quick-sort, one of the best sequential sorting algorithms, is actually quite difficult and inefficient to parallelize on the GPU. Also, while similar to CPU clusters, GPUs impose different resource constraints. For example, data transfer on the GPU is relatively fast, but branching and divergence in computation can be very costly.

"Given these challenges, developing new algorithms to exploit the GPU is no longer a mere engineering practice, but requires fundamentally rethinking our existing models and algorithms," notes Wang. To this end, Wang's research is focused on studying new mathematical models and efficient computational algorithms for visual computing, driven by the data-parallel architecture of GPUs. He summarizes his research contributions in three categories:

1. Precomputed Light Transport. In image synthesis, precomputed light transport (PLT) is a data-driven approach for interactive rendering with complex lighting. It works by decomposing the lighting domain into a suitable linear basis set, precomputing the scene's appearance under each basis, then applying the precomputed data at run-time to achieve high-quality rendering at interactive rates. As the users can dynamically modify light sources on the fly, it is particularly useful for lighting design applications, and is increasingly adopted in video games and commercial software. Wang has studied PLT extensively in previous work. His first contribution is to advance the state-of-the-art by enabling dynamic material effects such as glossy surface reflections and translucency. This allows users to modify not only light sources, but also material properties interactively on the fly. The second contribution is a GPU-based algorithm to speed up PLT by adapting its underlying computations towards data-parallel and GPU-friendly models. This led to 10 to 50 times performance gain over an optimized CPU equivalent with the same rendering accuracy. The third contribution is to propose new lighting basis that employs non-linear approximation methods to preserve rendering fidelity while maximally reducing the precomputed data size. This in turn benefits GPU-based computation because smaller data size leads to coherent memory access and better utilization of the GPU's cache.

2. Photorealistic Rendering of Dynamic Scenes. Although PLT is attractive for visual design and previewing, its precomputation requirement makes it unsuitable for applications involving dynamic geometry and deformable objects. More recently Wang has focused on new methods that allow users to modify any part of the scene on the fly. Together with his collaborators at Zhejiang University, they presented the first GPU-based algorithm for fully dynamic scenes that integrates a wide range of lighting effects, including multi-bounce indirect lighting, glossy reflections, caustics, and arbitrary specular paths. Their method builds upon the principles of sparse sampling and interpolation, which generally require progressively inserting new samples where the predicted error is high. The progressive insertion step creates data dependencies between every two samples, and disables parallel computation. The key to their method is a clever way to decouple the sample selection and evaluation steps, making both parallelizable on the GPU. The result is one to two orders of magnitude speedup over traditional methods. In addition, some of the components they have developed, including GPU-based kd-tree construction, query, and k-means clustering, are useful for general-purpose computations in other applications as well.

3. Stochastic Sampling. Stochastic sampling is a critical component in digital image synthesis. Samples with good spectral distribution properties (such as blue noise) are essential for improving simulation speed, reducing aliasing artifacts, and for producing visually pleasing textures and patterns. Working with graduate student John Bowers, they have proposed the first GPU-based algorithm for computing blue noise samples on the surfaces of arbitrary 3D objects. Not only is their algorithm 10x faster than the previous best-known algorithm, but they have presented a new quantitative method to measure the spectral distribution quality of surface samples. Most recently, Wang has worked with graduate student Yahan Zhou to introduce the first algorithm that can generate samples with any user-specified distribution function. With an efficient GPU-based implementation, the user can interactively synthesize new samples that mimic the distribution property of any exemplar sample set.

In addition to the above, Wang has worked extensively on using GPUs for more general-purpose computations, including solving the Singular Value Decomposition (SVD) of large matrices, building efficient spatial data structures for high-dimensional datasets, creating geometric puzzles, reconstructing 3D scenes etc.

"The rapid growth in GPU's computation power will continue to expand the frontiers of visual computing in the future," says Wang. "For sustained quality and speed improvements, it is essential to develop innovative algorithms and models that can adapt to the massively parallel architecture of the GPU. I hope to contribute new ideas and insights to help tackle some of the challenges in this direction."

Wang joined UMass Amherst in 2006. He received his Ph.D. in Computer Science from the University of Virginia in 2006 and B.S. in Computer Science from Zhejiang University in 2001. He received an NSF CAREER Award in 2008, an ACM Recognition of Service Award in 2011, and was a program co-chair for ACM Symposium on Interactive 3D Graphics and Games (i3D) in 2012.