Faculty Recruiting Support CICS

CIIR Talk Series: Data Augmentation and User Simulation Using Large Language Models

06 Oct
Friday, 10/06/2023 1:30pm to 2:30pm
Computer Science Building, Room 150/151; Virtual via Zoom
Seminar

Abstract: With the recent advancements in large language models (LLMs), the generation of synthetic data and the simulation of user-system interactions have gained attention. The impressive success of LLMs in multiple NLP tasks provides opportunities for the IR community in various areas, including data augmentation and user simulation. In this talk, I will present our recent work on data augmentation. Unlike other studies, we leverage LLMs to generate synthetic documents from queries in a few-shot setting, outperforming state-of-the-art data augmentation methods. Additionally, I will provide an overview of our work on user simulation using LLMs, focusing on providing feedback in mixed-initiative conversations and transitioning from reactive to proactive user simulators. We demonstrate that GPT-2 and GPT-3.5 can match human performance in providing user feedback in single-turn and multi-turn settings. Moreover, we utilize GPT-4 in a proactive user simulation setting, where the simulated user can lead the conversation by delving into a given topic. In this study, we follow a similar setup to the QuAC dataset and examine the effectiveness of GPT-4 in playing the roles of both the student and the teacher.

Bio: Mohammad Aliannejadi is an Assistant Professor at the University of Amsterdam, The Netherlands. His research interests include single and mixed-initiative conversational information access, user simulation, and recommender systems. Previously, he completed his PhD at the Università della Svizzera italiana, Switzerland, where he focused on novel information access approaches in conversations. He is passionate about advancing conversational search systems and has co-organized multiple data challenges in this area, including the ClariQ Conversational AI Challenge (ConvAI 3), the NeurIPS competition on Interactive Grounded Language Understanding in a Collaborative Environment (IGLU), the TREC Conversational Assistance Track (CAsT), and the TREC Interactive Knowledge Assistance Track (iKAT).

 

To attend this talk via Zoom, click here. To obtain the passcode for this series, please see the event advertisement on the seminars email list or reach out to ataubman [at] cs.umass.edu (Alex Taubman). For any questions about this event with the Center for Intelligent Information Retrieval, please contact jean [at] cs.umass.edu (subject: CIIR%20Talk%20Series) (Jean Joyce).