Faculty Recruiting Support CICS

Leveraging Multi-Modality and Probabilistic Modeling for Robust Machine Learning in Resource-Constrained Environments

04 Dec
Monday, 12/04/2023 10:00am to 12:00pm
CS 243
PhD Dissertation Proposal Defense

Recent breakthrough in deep learning techniques have led to staggering performance improvements in many domains, making autonomous systems a critical component for many real-world use cases. This is especially true in the Internet of Things (IoT) domain which imposes additional challenges due to environmental, networking, and hardware constraints. This thesis explores the use of multi-modality and probabilistic modeling to ensure more robust machine learning at the edge. We first employ multi-modality for the problem of zero-shot image classification. By utilizing a textual class hierarchy, we expose an accuracy-specificity trade-off that lets systems make accurate, albeit less specific, predictions under resource constraints. We then address the distributed execution of image classifiers: by splitting a neural network between an edge device and the cloud, partial execution on the edge sends latent features to the cloud for completion, demonstrating superior latency over conventional methods. Merging these strategies, we craft a distributed, hierarchical object detector validated via a prototype on ultra low-power edge hardware. We next evaluate the edge hardware runtime of recent transformer-based object detectors. We show how their unique characteristics simplify reasoning about bounding box uncertainty compared to earlier methods. Advancing from traditional visual tracking, we emphasize geospatial tracking, predicting 3D points in space rather than in image plane. With the support of a multi-camera dataset with motion-captured geospatial ground-truth, we train a probabilistic model of an objects position. The predictions are then fused using multi-observation Kalman Filters. This results in an enhanced understanding of an object's location uncertainty. The forthcoming work for this thesis includes extending the above approach to support multiple objects and modalities like audio, depth, and radar. We also aim to address resource scheduling for these modalities and views, leveraging predicted uncertainty from each node to dynamically actuate sensors and minimize resource usage. This will be further expanded to real-world data using GPS locations and real vehicles, and ultimately, we will explore training an unsupervised sensor foundational model across all available data.

Advisor: Ben Marlin