CIIR Talk Series: Alessandro Moschitti, Understanding RAG-LLMs: A Practical Perspective from Web-Based QA
Content
Speaker
Alessandro Moschitti, Amazon
Abstract
Recent advances show that Large Language Models (LLMs) can answer a wide range of questions with impressive fluency, yet even state-of-the-art systems such as ChatGPT and Gemini remain susceptible to hallucinations—confident but incorrect statements. Retrieval-Augmented Generation (RAG) aims to address this limitation by grounding model outputs in external evidence, but its effectiveness hinges on the quality of retrieved context: noisy or loosely related passages can dilute key information and reduce answer reliability.
In this talk, we revisit RAG through the lens of our earlier work on Web-based Question Answering, where accurate search and reranking pipelines supported small generative models. This architecture—built on traditional retrieval engines, context-aware rerankers, and selective top-k evidence—offers an intuitive and practical interpretation of grounding in modern LLMs. We also describe a fine-tuning approach developed independently yet conceptually aligned with Reinforcement Learning from Human Feedback (RLHF). Finally, we present recent results on quantifying question complexity in RAG-LLMs and discuss emerging directions for optimizing grounding quality.
Speaker Bio
Alessandro Moschitti has been a Principal Research Scientist at Amazon since 2018. He led the science behind Alexa’s information services and built Alexa’s first web-based Question Answering (QA) system. In 2021, he developed Alexa’s first Retrieval-Augmented Generation LLM (RAG-LLM), which became the foundation of the Amazon Alexa LLM deployed to millions of users. His current work focuses on multimodal LLMs, retrieval, and information grounding for AGI, and he currently leads the science for Amazon’s Local Search Service.
Alessandro has over 25 years of experience in NLP, information retrieval, and machine learning. He contributed to IBM Watson’s Jeopardy! Grand Challenge and later served as a Principal Scientist at the Qatar Computing Research Institute, where he led a major collaboration with MIT CSAIL. He was a Professor at the University of Trento for 15 years and has published more than 350 peer-reviewed papers.
He has received four IBM Faculty Awards, one Google Faculty Award, and five best paper awards. He has held Chair roles in more than 70 international conferences and workshops across NLP, IR, ML, and AI, including General Chair of EACL 2023 and EMNLP 2014 and Program Co-Chair of CoNLL 2015