PhD Thesis Defense: Katherine Thai, Modes of Human–AI Collaboration in Text: Benchmarks, Metrics, and Interpretive Tasks
Content
Speaker
Abstract
As large language models (LLMs) become ubiquitous in writing and research workflows, understanding how humans and AI systems collaborate on authoring text has emerged as a central research challenge. Human–AI collaboration spans a spectrum from generation to revision to agentic tasks. Clear frameworks, datasets, and metrics are required for studying and improving these interactions. This thesis investigates how LLMs can assist, transform, and extend human work across three specific tasks: translation, editing, and literary interpretation.
The thesis begins with Par3, a work that compares how LLMs versus humans perform the task of literary translation. Human evaluations show that expert translators strongly prefer human-produced translations over machine outputs. A post-editing model trained on Par3 narrowed this gap, demonstrating how AI can support, but not replace, human interpretive labor in translation. Next, the thesis introduces EditLens, a new framework for quantifying the magnitude of AI edits to human-written text. While prior work has focused on detecting fully AI-generated writing, EditLens shows that AI-edited text can be measured on a spectrum. This work provides a metric for quantifying collaboration in mixed authorship human–AI writing, with implications for authorship attribution, education policy, and the study of writing assistance tools. The final part of the thesis explores literary evidence retrieval as a testbed for high-level interpretive reasoning. Through RELiC, a dataset requiring models to recover quotations from literary works based on scholarly commentary, the thesis shows that frontier models approach human-level retrieval efficiency. Together with a human evaluation of model reasoning traces, this chapter highlights both the promise and the difficulty of using AI systems as collaborators in open-ended interpretive work. Collectively, these contributions reveal the opportunities and limitations of current systems while introducing a new methodology for measuring collaborative writing processes.
Advisor
Mohit Iyyer