PhD Dissertation Proposal: Alex Scarlatos, Creating Realistic Simulated Students: Fine-Tuning LLMs with Reinforcement Learning for Knowledge and Behavior Alignment
Content
Speaker:
Abstract:
As large language models (LLMs) are increasingly used in education, there is a growing need to quickly verify and steer LLM-generated content. In particular, it is important to ensure that educational AI systems improve learning outcomes for students. While it is possible to test new AI systems on real students, this process can be slow, costly, and insecure. Instead, simulated students, i.e., LLM-based models that mimic student behavior, can be used to quickly and safely test AI systems on. However, LLMs do not typically behave like real students, often not following realistic behavioral patterns or knowledge trends, limiting how useful they can be. This thesis presents multiple approaches for aligning LLMs with realistic student behavior and demonstrates how simulated students can promote better student outcomes in downstream generated content.
First, we study student simulation in open-ended testing settings, including essay writing and coding. We use simulated student responses to questions to obtain reliable estimates of question difficulty, which is necessary for calibrating standardized tests and question recommendation systems. We introduce SMART, a novel method for fine-tuning LLMs with reinforcement learning (RL) that aligns the models with realistic response patterns based on student ability and question difficulty, deriving a reward function from item response theory (IRT). We show that SMART outperforms state-of-the-art methods for difficulty prediction and is much better aligned with student ability compared to other simulated student methods.
Second, we study student simulation in math tutoring dialogues, which are increasingly used in online learning platforms to provide real-time feedback and guidance to students. We propose a framework for estimating student knowledge across concepts in dialogue turns by adapting knowledge tracing (KT) to the dialogue setting. We introduce an LLM-based student model for this task, LLMKT, which outperforms classic KT approaches. We then show how to use student models to improve student outcomes with LLM-based tutors. We achieve this by training an LLM tutor with RL, where the reward encourages tutor turns that are expected to increase student knowledge using estimates from LLMKT. We find that this RL-trained tutor outperforms much larger models in terms of pedagogical quality and likelihood of positive learning outcomes.
Finally, we propose an evaluation framework for verifying the realism of student simulations in dialogues. Specifically, we will develop a set of automated evaluation metrics that compare behavioral, linguistic, and knowledge-based aspects of simulated student turns to real student turns. We will then train LLM-based simulated students with RL, using our automated metrics as reward functions, to optimize simulations towards realism. Overall, this thesis details multiple approaches for student simulation that prioritize alignment with real student behavior and knowledge, filling a gap in the current field and enabling development of educational AI that benefits from simulated students in-the-loop.
Advisor:
Andrew Lan