PhD Thesis Defense: Jin Huang, Computation-Communication Co-Design for Efficient Deployment of Deep Learning and Vision-Language Models on Resource-Constrained IoT Devices
Content
Speaker:
Abstract:
The proliferation of IoT devices and rapid advances in AI models demand low-latency inference over continuous sensory streams. In practice, deployment faces two coupled bottlenecks: limited compute and memory on edge devices, and constrained, time-varying wireless bandwidth when cloud assistance is required. Existing approaches treat model placement as a static decision and therefore optimize computation and communication separately. Real-time edge intelligence instead requires computation--communication co-design rather than static model deployment.
This dissertation presents five systems that jointly optimize computation and communication across two workload regimes. For CNN-based, single-pass perception pipelines over low-power radios, CLIO adapts transmitted feature size to time-varying bandwidth through progressive slicing. SPEX prevents delayed modalities from stalling inference by speculating over missing inputs and rolling back only when necessary. FLEET overlaps local execution with opportunistic offloading on duty-cycled radios and enables early exit through cloud-side multi-layer fusion. For vision--language and generative workloads, LiteVLM reduces on-device VLM latency through query-aware visual pruning, cloud-assisted token prioritization, and speculative decoding. MASQ addresses a refinement gap in cascaded offloading: once the local tier responds, the output remains unchanged until the cloud reply arrives. MASQ closes this gap by streaming prioritized visual evidence to a near-edge gateway for continuous anytime inference, escalating to the cloud only when cross-tier disagreement indicates true uncertainty.
Taken together, these systems show that computation--communication co-design is a practical design principle for real-time edge intelligence, yielding better latency--energy--accuracy trade-offs than decoupled designs across the IoT--cloud pipeline.
Advisor:
Deepak Ganesan