PhD Thesis Defense: Nigel Fernandez, Natural Language Processing for Scalable Educational Assessment and AI Systems
Content
Speaker:
Abstract:
Recent advances in natural language processing (NLP), particularly large language models (LLMs), have created new opportunities to develop scalable educational technologies that support assessment, feedback, and personalized learning. Realizing this vision requires AI methods capable of performing several core educational tasks, including generating high-quality assessments, answering course-related questions reliably, and operating efficiently at scale. Developing methods that enable these capabilities is critical for building educational AI systems that can support instructors and learners in large, diverse learning environments. This dissertation investigates how NLP methods can enable scalable educational assessment and AI systems across several key components of the educational pipeline.
First, we study scalable assessment generation. Multiple-choice questions (MCQs) are widely used in educational settings, yet constructing effective distractors, i.e., plausible but incorrect answers, is challenging and time-consuming. We introduce DiVERT (Distractor Generation with Variational Errors Represented as Text), a generative framework that models student misconceptions to produce realistic distractors for math MCQs.
Second, we investigate course logistics question answering, a practical application of educational AI systems that can help students access important course information. We introduce SyllabusQA, a dataset of 5,078 open-ended question-answer pairs derived from 63 real course syllabi across 36 majors. We benchmark several strong baselines on this task, ranging from LLM prompting to retrieval-augmented generation. To ensure reliability in educational contexts, we also introduce Fact-QA, an LLM-based evaluation metric designed to assess the factual accuracy of generated answers.
Third, we address the efficient deployment of large language models in educational AI systems. We propose RADAR (Reasoning-Ability and Difficulty-Aware Routing), a psychometrics-inspired routing framework that allocates queries to different LLM configurations based on estimated query difficulty and model ability, enabling improved performance-cost tradeoffs.
Together, these contributions demonstrate how NLP methods can support scalable educational assessment and AI systems by enabling automated assessment generation, reliable educational question answering, and efficient LLM deployment.
Advisor:
Andrew S. Lan