Content

Speaker

Nathan Kwan-Ho Ng

Abstract

In recent years, a new class of latency-sensitive applications demanding stringent low-latency guarantees such as mobile augmented and virtual reality (AR/VR) and real-time machine learning inference has emerged. Edge computing, which places compute and storage resources closer to end users, is a promising approach for addressing the needs of such applications by effectively reducing network latency. However, edge servers are typicallymore resource-constrained than traditional cloud servers. Under real-world dynamic conditions, such as workload bursts or resource contention from multi-tenancy, edge servers may experience degraded performance due to their limited capacity to elastically scale applications. This underscores the need for intelligent resource management techniques to fully realize the potential of edge computing.

This thesis proposes resource management techniques that mitigate performance degradation caused by such real-world dynamics to ensure robust performance. First, I propose TailClipper, a distributed scheduler that minimizes the tail latency of distributed services through system-wide scheduling. Unlike prior approaches that optimize performance at the component level, TailClipper first tags and propagates requests with their global arrival timestamps, then applies a scheduling strategy inspired by queueing theory that combines global First-Come-First-Serve (FCFS) with Limited Processor Sharing (LPS) to mitigate performance degradation caused by distributed request processing effects such as request reordering. Second, to assess whether edge offloading remains beneficial in the era of programmable accelerators, I propose analytic models based on queueing-theoretic results to capture the behavior of device and edge accelerators in both local processing and offloading scenarios. Specifically, these models capture the effects of dynamic factors such as the relative performance of the device and edge server, network variability, server load, and multi-tenancy on the edge server. I experimentally validate the models and demonstrate how they can be integrated into a resource manager to enable intelligent offloading under dynamic network conditions and multi-tenant edge server environments. Finally, I propose a multi-tenant inference serving system to enable efficient DNN inference execution on memory-constrained edge accelerators.

Advisor

Prashant Shenoy