Faculty Recruiting Support CICS

CIIR Talk Series: Cross-Language Information Retrieval for Report Generation

10 May
Friday, 05/10/2024 1:30pm to 2:30pm
(In person) Computer Science Building, Room 150/151; Virtual via Zoom
Seminar

Abstract:  The purpose of Cross-Language Information Retrieval is to rank documents in one language using queries in another language. Recent advances in large language models have led to a new wave of CLIR algorithms, roughly following their monolingual cousins. These algorithms improve retrieval effectiveness but say nothing about how end users are to use CLIR results (especially if they cannot read the documents). Machine translation offers one solution; it is finally good enough to produce fluent text, especially for well-written domains such as newswire. Generating a report in the language of the user based on the retrieved documents may be a better way to convey the results of such a search. This talk will discuss recent advances in CLIR and multilingual information retrieval (MLIR) at the Human Language Technology Center of Excellence, focusing on how to train multilingual pre-trained language models for the ColBERT bi-coder architecture.  It will also touch on the importance of evaluation data to drive new research and on the report generation task that is part of the TREC NeuCLIR Track in 2024.

Bio:  Dr. Dawn Lawrie is a senior research scientist at Johns Hopkins University where she works with the Human Language Technology Center of Excellence. Her current research focuses on cross-language and multilingual information retrieval and the development of test collections to evaluate algorithms. Before joining the HLTCOE, she spent fifteen years as a professor of Computer Science at Loyola University Maryland and four years as department chair. Dr. Lawrie received her Ph.D. from the University of Massachusetts, Amherst in 2003.  Broadly, her research interests include information retrieval, natural language processing, and applying techniques from those fields to software engineering problems.  Currently, Dr. Lawrie is a co-organizer of TREC NeuCLIR. Twice she has served as a co-lead of the summer workshop SCALE, first on name entity recognition and then on cross-language information retrieval. In addition, she has been a Program Co-Chair and General Chair for the IEEE International Working Conference on Source Code Analysis and Manipulation. Dr. Lawrie received the best (short) paper honorable mention award from ECIR ’24.

Zoom link: https://umass-amherst.zoom.us/j/94145788592

To obtain the passcode for this series, please see the event advertisement on the seminars email list or reach out to Hamed Zamani.