Faculty Recruiting Support CICS

Efficient Multi-Task Learning for Computer Vision Applications

03 May
Friday, 05/03/2024 2:00pm to 4:00pm
Hybrid - LGRC A215 & Zoom
PhD Dissertation Proposal Defense
Speaker: Lijun Zhang

Multi-task models, a type of DNN model designed to tackle multiple prediction tasks simultaneously, have gathered considerable attention in the literature under the paradigm of multi-task learning (MTL). This attention is driven by their demonstrated capability to significantly enhance task performance and inference efficiency compared to single-task models, rendering them particularly suitable for deployment in resource-constrained settings. 

However, creating multi-task models presents multi-fold challenges. The most critical one is to determine the parameters to share across tasks to maximize task accuracy and inference efficiency. Historically, multi-task models have relied on manual design informed by domain expertise, often yielding sub-optimal task performance. More recent generative AI models such as diffusion models adopt a straightforward approach by sharing all parameters across tasks. It is unclear whether alternative parameter sharing patterns could enhance model generalization performance. Other challenges include the long model training time and a significant amount of manual effort in programming multi-task models. 

In this thesis proposal, our aim is to advance the state-of-the-art in MTL methodologies by addressing the challenges in creating multi-task models that excel in both task accuracy and resource efficiency. We begin by revisiting the common practice in multi-task model design, wherein initial DNN layers are shared across tasks before divergence at later layers. Our empirical evidence across diverse visual domains challenges this convention, revealing that separate bottom-layer parameters significantly outperform traditional shared approaches. 

Additionally, we developed two programming frameworks for automatically creating efficient and accurate multi-task models for a given set of vision tasks. The first framework focuses on designing tree-structured multi-task models, while the second extends this concept to consider more generalized parameter sharing patterns. Both frameworks take in the specification of a backbone DNN and automatically transform the DNN into multi-task models that achieve high task accuracy under resource constraints. 

Motivated by the recent advances in diffusion models, we propose two future directions. The first direction is to examine diffusion models under the lens of MTL and study how alternative parameter sharing strategies could affect the training efficiency and image generation quality of diffusion models. The second direction is to study how the generative capabilities of diffusion models can be leveraged to generate parameters of multi-task models and thus save model training time. The proposed thesis will bridge traditional MTL approaches with groundbreaking generative AI, fostering mutual advancements in both fields.

Advisor: Hui Guan

Join via Zoom