MLSys Seminar: Optimizing Foundation Models at Scale

14 Nov

Tuesday, 11/14/2023 11:00am to 12:00pm

Lederle Graduate Research Center, Room A311

Seminar

Abstract: Foundation models, like most current large language models, are expensive to train, fine-tune, and run. Even so, given the significant improvements in down-stream tasks that come from using them, they are increasingly being used. Given that hardware and energy costs are not coming down at the scale these models are growing in size, optimization for all the life-time of these models keep growing in importance. This talk will cover our efforts to bring multiple optimization techniques coming from research labs in academia and industry together in a way that is easily usable and composable. The first tangible result is our implementation of Llama (and other decoder-only models), which is the first to successfully integrate optimizations such as FlashAttention, torch.compile(), CUDA graphs, and TensorParallel for inference, as well as Fully Sharded Data Parallel and activation checkpointing for training and fine-tuning. The talk will also cover our current work for future models.

Bio: Antoni Viros is currently a Research Scientist at IBM Research in Yorktown Heights, NY. He's currently doing research on optimization for Large Language Models, encompassing training, fine-tuning, and inference. He's an active contributor to PyTorch, as part of this optimization work. Before IBM, he worked at Meta, where he also worked on PyTorch. Antoni has a PhD in Aerospace Engineering, and his research while in academia was concerned with virtual assistants for engineering design. He developed Daphne as a prototype virtual assistant. Daphne has been tested and used at NASA's Jet Propulsion Laboratory and Johnson Space Center, both for engineers developing new space missions, and for aspiring astronauts training in isolation.

Faculty Host

:

Hui Guan

MLSys Seminar: Optimizing Foundation Models at Scale

Subscribe to the CICS eNewsletter