Content

Speaker:

Hansi Zeng

Abstract:

Recent advances in large generative models have reshaped the landscape of information retrieval. Motivated by the strong scaling behavior of transformer-based language models, Generative Information Retrieval (GIR) reformulates document retrieval as a conditional generation problem. In this paradigm, each document is represented by a unique document identifier (DocID), and a sequence-to-sequence model is trained to generate relevant DocIDs directly given a query. By aligning retrieval with the next-token prediction objective of modern large language models, GIR offers a conceptually unified and differentiable framework that can potentially benefit from large-scale pre-training, advanced decoding strategies, and joint optimization with downstream generation tasks.

Despite its appealing formulation, existing generative retrieval models, such as DSI, exhibit significant practical limitations. Empirical studies show that DSI struggles to scale to large corpora containing millions of documents and often underperforms strong lexical and dense retrieval baselines on standard benchmarks such as MSMARCO. In addition, DSI relies on autoregressive decoding with constrained beam search, making inference computationally expensive and introducing a fundamental trade-off between retrieval effectiveness and efficiency. These challenges raise critical questions about the feasibility of deploying GIR in real-world large-scale retrieval systems.

In this proposal, our goal is to make generative retrieval scalable and practical for real-world information retrieval systems that operate over millions of documents. We will develop novel optimization objectives and decoding strategies that jointly improve the effectiveness and efficiency of GIR models at scale. In particular, we will address limitations in DocID construction, ranking optimization, and beam search decoding to enable stable large-scale training and low-latency inference.

In the final stage of this proposal, we extend the generative information retrieval framework with explicit reasoning capabilities, with a particular focus on the ranking component. We propose reasoning-aware generative ranking models that perform structured intermediate reasoning before producing final rankings, analogous to the reasoning process employed by models such as o1 and R1 in the Large Language Models.

Furthermore, we generalize the objective of ranking from optimizing for human-labeled document relevance to optimizing for downstream machine utility. In particular, we treat ranking as an optimizable search module designed to support deep research agents, where the goal is to select information that maximally benefits the final outcome of the deep research agents.

Advisor:

Hamed Zamani