CIIR Talk Series: Benjamin Piwowarski, Closing the Efficiency–Effectiveness Gap with Cross-Encoders
Content
Speaker
Benjamin Piwowarski, Sorbonne University
Abstract
Neural cross-encoders remain the effectiveness ceiling for text re-ranking, yet they sit at a singular point on the efficiency–effectiveness curve: accurate enough that we cannot retire them, slow enough that we wish we could. This talk confronts that trade-off through two complementary studies.
First, which training recipe actually matters? A controlled reproduction across 9 encoder backbones (BERT through ModernBERT/Ettin, 17M–184M parameters) and 6 training objectives shows that choosing the right loss rivals scaling the backbone. We also briefly describe experimaestro, the framework that makes this study auditable and reproducible.
Second, which cross-encoder computations are actually necessary? A progressive attention-masking analysis identifies the superfluous ones and yields MICE, a minimal-interaction architecture 4× faster than a standard cross-encoder—matching late-interaction latency without sacrificing ranking quality.
Together, these studies chart a concrete path toward closing the efficiency–effectiveness gap.
Speaker Bio
Dr. Benjamin Piwowarski is a senior researcher (Directeur de Recherche) at the French National Center for Scientific Research (CNRS), working in the MLIA team at ISIR (Sorbonne Université). His current research centers on natural language processing and information access, with a focus on neural information retrieval, dialogue-based information access, controlled text generation, multilingual representation learning, and a better understanding of transformer architectures. Previously, he held a Research Associate position at the University of Glasgow (2008-2011) on quantum-physics-inspired models for information access; worked at Yahoo! Research (2006-2008) on web mining and on models of the interaction between users and search engines; and was a postdoc at the University of Chile (2004-2006) on XQuery evaluation. His PhD (1999-2003) applied machine learning techniques (Bayesian Networks) to Structured Information Retrieval; within this field he also contributed to new evaluation metrics for search engines. He regularly serves on the program committees of the main Information Retrieval and NLP venues (SIGIR, CIKM, ECIR, ARR) and is a member of the CNRS National Committee, the evaluation body for CNRS researchers and laboratories. He was General Chair of SIGIR 2019 in Paris, and Program Co-Chair of ECIR 2018 (Grenoble) and ICTIR 2025 (Padua).
About
The CIIR Talk Series is an initiative for researchers and practitioners working on information retrieval and related disciplines to present their work.
Subscribe to the Zoom link/passcode notification mailing list by sending an email to ciir-talks-request [at] cs [dot] umass [dot] edu (ciir-talks-request[at]cs[dot]umass[dot]edu) with "subscribe" as the email subject (without the quotation marks), or click here for the Zoom link and reach out to zamani [at] cs [dot] umass [dot] edu (subject: CIIR%20Talks%20Passcode) (Hamed Zamani) for the passcode.