Graphics Hardware Research

While my early research focused on the architectures of GPUs for non-graphics (or "general purpose") computing (aka GPGPU), some colleagues in industry pointed out their companies still sell GPUs for graphics and encouraged (and funded) me to explore hardware support for graphics. This project started out with a focus on adding support for raster based graphics to GPGPU-Sim and studying the resulting challenges in a mobile systems-on-chip. More recently (and with more funding) it has progressed to exploring hybrid rendering and ray tracing. Over the years this project has received support from a variety of sources including an NSERC Collaborative Research Grant with Qualcomm, a Google Faculty Research Award, gift funding from Activision and most recently a large grant from Huawei to support research on raytracing support on GPUs.

Hybrid rendering

Hybrid rendering combines ray-tracing and rasterization graphics techniques to generate visually accurate photorealistic computer generated images at a tight real-time frame rate. Raytracing excels at computing photorealistic images by allowing for reflection and refraction rays. Raytracing enables true global illumination in computer-generated images by imitating how light propagates in the real world.

However, tracing rays is very expensive both in terms of energy consumption and in render time. Rasterization on the other hand sacrifices some visual quality for render speed. Rasterized images achieve a relatively high degree of visual accuracy in real-time. Rasterization is primarily used in the video game industry where 60 frames per second rates are needed to achieve smooth game play. Raster-based rendering has benefited from custom-tailored, ever-evolving, hardware architecture of Graphics Processing Units (GPUs).

Recent advances in GPU architecture have incorporated ray-tracing elements into current rasterization based GPUs. This combination is typically referred to as hybrid rendering. This research seeks to explore the hybrid rendering design space from both a software and hardware perspective.

Graphics simulation

With billions of users, mobile SoCs are the most ubiquitous computing platforms. However, there is a need in academia for architecture research tools to study them. SoCs have become increasingly more heterogeneous and complex, where a typical SoC system can include CPUs, GPUs, image processors, video encoders/decoders, digital signal processors (DSPs) and 2D engines among others.

Architecture research tools have not kept pace with recent advancements in SoC research. Research tools focus on CPU multi-core systems, and lack support for other specialized IP cores like GPUs and DSPs. These IP cores typically share the same main memory (DRAM) with general purpose CPU processors, opening the door to a plethora of possible interactions. Thus, going forward, it is crucial to future architecture research to consider system-wide interactions to optimize for emerging workloads.

This research developed Emerald, the first publicly available simulator that models heterogeneous mobile SoC systems in sufficient detail to enable high quality hardware architecture research. One of Emerald's main differentiators is support for detailed timing of modern GPU cores running OpenGL graphics workloads. Moreover, gem5-graphics will be extensible so that architecture researchers can add IP accelerators to study their impact upon system performance and energy consumption. We are currently exploring IP blocks for accelerating augmented reality. The source code for Emerald can be found here.

More recently there has been a trend towards GPUs including hardware support dedicated to enabling real-time ray traced lighting effects. We have explored hardware optimizations including as ray-triangle intersection prediction to avoid traversing the entire bounding volume hierarchy (BVH) tree for effects such as ambient occlusion and prefetching of BVH treelets. This work required developing new GPU simulation tools and benchmarks: Vulkan-Sim and Lumibench.


Yuan Hsi Chou, Tyler Nowicki, Tor M. Aamodt, Treelet Prefetching For Ray Tracing, In proceedings of the 56th IEEE/ACM International Symposium on Microarchitecture (MICRO 2023), Toronto, ON, Canada, Oct 28-Nov 1, 2023. (acceptance rate: 101/424 ≈ 23.8%) code
Lufei Liu, Mohammadreza Saed, Yuan Hsi Chou, Davit Grigoryan, Tyler Nowicki, Tor M. Aamodt, LumiBench: A Benchmark Suite for Hardware Ray Tracing, In proceedings of the 2023 IEEE International Symposium on Workload Characterization (IISWC 2023), Ghent, Belgium, Oct 1-3, 2023. (acceptance rate: 15/47 ≈ 31.9%) Best Paper Award code.
Mohammadreza Saed, Yuan Hsi Chou, Lufei Liu, Tyler Nowicki, Tor M. Aamodt, Vulkan-Sim: A GPU Architecture Simulator for Ray Tracing, In proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture (MICRO 2022), Chicago, IL, October 1-5, 2022. (acceptance rate: 83/369 ≈ 22.5%) code video
Lufei Liu, Wesley Chang, Francois Demoullin, Yuan Hsi Chou, Mohammadreza Saed, David Pankratz, Tyler Nowicki, Tor M. Aamodt, Intersection Prediction for Accelerated GPU Ray Tracing, In proceedings of the 54th IEEE/ACM International Symposium on Microarchitecture (MICRO), Athens, Greece, October 16-20, 2021. (acceptance rate: 94/430 ≈ 21.9%) code video

Ayub Gubran, Tor M. Aamodt, Emerald: Graphics Modeling for SoC Systems, In proceedings of the ACM/IEEE International Symposium on Computer Architecture (ISCA'19), pp. 169-182, Phoenix, Arizona, June 22-26, 2019. (slides)