Machine Learning and Friends Lunch: Yuanqi Du, Scientific Knowledge Emerges in LLMs and You Can Extract It
Content
Speaker
Yuanqi Du (Cornell University)
Abstract
The emerging capabilities of large language models (LLMs) are opening new frontiers in scientific research, including experiment operation, literature retrieval, and molecular design. A central question, however, is whether LLMs truly encode scientific knowledge—and if so, how this knowledge can be systematically extracted. In this talk, I will present an affirmative answer to this question, supported by strong quantitative and empirical evidence. I will begin by framing knowledge extraction as a search problem with a computational verifier. I will illustrate through three problems: molecular optimization, crystal structure generation, and retrosynthesis. In all three cases, LLMs demonstrate impressive performance compared to state-of-the-art computational approaches. I will conclude by reflecting on analogous discoveries in other scientific domains and highlighting key questions for future exploration.
Speaker Bio
Yuanqi Du is a PhD candidate in Computer Science at Cornell University, where he studies the intersection of AI and scientific discovery. His research centers on developing principled, efficient probabilistic and geometric models that accelerate scientific discovery, from hypothesis search, validation to automation, with a special focus on the intersection of physics and chemistry and their applications in drug and materials discovery. Yuanqi’s work has appeared in leading machine learning venues (NeurIPS, ICML, ICLR) and featured as cover articles in top-tier scientific journals, including Nature, Nature Machine Intelligence, Nature Computational Science, and Journal of the American Chemical Society. As a passionate community builder, Yuanqi has organized over 20 community events, including conferences, workshops, and seminars across AI for Science, geometric deep learning and probabilistic machine learning.