Abstract: High performance computing resources, such as multi-core Central Processing Units (CPUs) and Graphics Processing Units (GPUs) are readily available to programmers, due to the immense popularity of computationally demanding applications, such as machine learning and image processing. Programmers today can access these resources in a variety of ways, including direct access and via the cloud. However, writing programs to utilize these resources efficiently and easily is a significant challenge.
This thesis proposal presents new abstractions, domain-specific languages, and code generation techniques for programming high performance systems. First, I present techniques to optimize image processing programs on modern GPUs that improves the concurrency and register utilization. Second, I present the first system that accelerates graph sampling on GPUs. Third, I presents a domain specific language and a compiler to co-optimize communication and computation in distributed machine learning workloads. Finally, I present a foundational semantics of serverless computing, which programmers can use to reason about their code easily. Together, these techniques helps programmers write correct and efficient code.
Advisor: Arjun Guha