Content

Speaker

Image
A photo of Franco Maria Nardini.
A photo of Franco Maria Nardini.

Franco Maria Nardini, ISTI-CNR

Abstract

In recent years, transformer-based large language models (LLMs) have fundamentally reshaped the way large textual collections are indexed and retrieved. A key driver of this transformation is the use of LLMs to learn high-dimensional, contextual sparse representations of input text. These representations are rapidly gaining popularity for several reasons: 1) they perform competitively with learned dense representations, 2) they are grounded in the LLM's vocabulary, enabling interpretability by design, and 3) they can be efficiently leveraged with a well-established data structure—the inverted index—to support fast maximum inner product search. In this talk, we will review recent advancements that enable efficient indexing and retrieval based on these sparse representations. We will then discuss current limitations and emerging challenges in this rapidly evolving area.

Speaker Bio

Franco Maria Nardini is a Research Director with ISTI-CNR in Pisa, Italy. His research interests focus on Web Information Retrieval and Machine/Deep Learning. He authored over 100 papers in peer-reviewed international journals, conferences, and other venues. He has been General Co-Chair of ECIR 2025, Program Committee Co-Chair of SPIRE 2023, Tutorial Co-Chair of ACM WSDM 2021. He is a co-recipient of the ECIR 2025 Best Student Short Paper Award, the ACM SIGIR 2024 Best Paper Runner-Up Award, the ECIR 2022 Industry Impact Award, the ACM SIGIR 2015 Best Paper Award, and the ECIR 2014 Best Demo Paper Award. He coordinated activities in several EU and IT research projects. He is a member of the editorial board of ACM TOIS and a PC member of SIGIR, ECIR, SIGKDD, CIKM, WSDM, IJCAI, and ECML-PKDD. He currently teaches “Information Retrieval” in the Computer Science and AI Master Degrees of the University of Pisa.
 

Hybrid event posted in CIIR Talk Series