Deep Learning Model Serving on Edge Platforms

19 Jan

Thursday, 01/19/2023 10:00am to 12:00pm

Lederle Graduate Research Center, Room A311; Virtual via Zoom

PhD Dissertation Proposal Defense

Deep learning has become the de facto standard approach to solve data analysis problems in a wide range of fields, such as computer vision and neural language processing. Recently, with the proliferation of mobile devices and Internet of Things (IoT), billions of mobile and IoT devices are generating zillions bytes of data that need to be analyzed in real time using deep learning. To achieve high-computation and low-latency requirements, edge computing is employed, where trained deep learning models are deployed to the servers or devices near internet clients for inference. However, edge devices are often resource-constrained in terms of computational capacity and energy. In this thesis, I propose research on system designs that exploit both the software and hardware characteristics to efficiently manage edge resources for deep learning model serving.

First, to understand the software and hardware characteristics, I propose a study to characterize model serving-based IoT applications using specialized edge architectures. I experimentally compare the benefits and limitations of using specialized edge systems, built using edge accelerators, to more traditional forms of edge and cloud computing.

Second, since their resource-constrained natural, edge servers and accelerators tend to be multiplexed across multiple IoT applications, introducing the potential for performance interference between latency-sensitive workloads. To solve this problem, I propose to design analytic ququeing models to capture the performance of model serving workloads on shared edge accelerators, such as GPU and edgeTPU, under different multiplexing and concurrency behaviors.

Third, to support multiple applications with widely different latency, energy and accuracy requirements on embedded edge accelerators with limited computational and energy resources, I propose a conditional execution framework based on multi-exit deep neural networks (DNNs), which enable granular control over inference requests.

Finally, many modern accelerators support Dynamic Voltage and Frequency Scaling (DVFS), which allows user to adjust the power and speed settings on the devices. A system supports application-specific DVFS is desirable because it allows the applications to optimize their energy efficiency while also meeting their requirements. However, a traditional time-base fair share scheduler becomes unfair when we take energy into account. To address this problem, I propose to design a scheduler time and energy-fair scheduler that tracks both time and energy to incentivize energy conservation.

Advisor: Prashant Shenoy

Join the Zoom

Deep Learning Model Serving on Edge Platforms

Subscribe to the CICS eNewsletter