Faculty Recruiting Support CICS

Leveraging Explanations for Information Retrieval Systems under Data Scarcity

05 Jun
Wednesday, 06/05/2024 10:00am to 12:00pm
PhD Seminar
Speaker: Puxuan Yu

The importance of explanations in the advance of information retrieval (IR) systems is on the rise. On one hand, this is driven by the increasing complexity of IR systems and the demand for transparency and interpretability from users; on the other hand, explanations can inherently improve the effectiveness of IR systems without necessarily being displayed to users. However, the scarcity of data poses significant challenges in developing these explanations, as acquiring high-quality explanations for relevance judgments is prohibitively expensive yet crucial for training neural network-based IR models and explanation generation models. To overcome these challenges, we utilize open-domain knowledge and generative language models to facilitate the generation of user-oriented explanations for various IR tasks limited by data availability.

We start by introducing a novel model-agnostic task for search result explanations that emphasizes context-aware summaries, detailing each document's relevance to the query and other documents. To address this task, we design a novel Transformer-based encoder-decoder architecture. Next, we develop an inherently explainable IR model specifically designed to provide diversified reranking of retrieved documents. This model is pre-trained on open-domain data using explanation tasks, achieving state-of-the-art results in search result diversification with minimal domain-specific data. Additionally, we explore how natural language explanations can enhance the capabilities of generative language models to augment IR datasets through synthetic query generation, achieved by automatically identifying similarities and differences between document pairs. Finally, we utilize zero-shot generative language models to directly elicit natural language explanations of relevance between search queries and candidate documents, providing crucial auxiliary information for the calibration of neural ranking models and thus enhancing their ability to generate meaningful scores.

Advisors: James Allan and Negin Rahimi

Join via Zoom