Faculty Recruiting Support CICS

Team Wins Top Prize in Mathematics Automated Scoring Challenge

Andrew Lan
Andrew Lan

A team of computer scientists from UMass Amherst, led by assistant professor Andrew Lan and including graduate students Wanyong Feng, Jaewook Lee, William McNichols, Alex Scarlatos and Mengxue Zhang, recently took home the top award in the second annual Math Automated Scoring Challenge, run by the National Center for Education Statistics, the organization that administers The Nation's Report Card.

In this challenge, participants created algorithms to score students' responses to open-ended questions about how they solved a multiple-choice mathematics problem. The winners used advanced natural language processing methods that promise to reduce scoring costs while providing additional insights about student responses. All the winning teams' scores were similar to human scoring on open-ended text questions.

"Open-ended, open-response and short-answer questions are more pedagogically effective than alternatives, such as multiple-choice, true-or-false and fill-in-the-blank questions," says Lan. "However their widespread usage is limited since manually grading a large number of student responses is not feasible. Automated scoring can free up teacher time so that they can focus on less repetitive tasks, such as communicating with students and providing feedback."

Lan and his team adapted large language models, and then trained them on data from The Nation's Report Card to help them become effective for automated scoring. They also proposed a way to provide actionable feedback to students by informing them of the minimum number of edits needed to turn their incorrect response into a correct one.  

"Recent student performance on mathematics items underscores how important it is for us to provide advanced approaches to scoring," said NCES Commissioner Peggy G. Carr. "Winning teams scored responses accurately and described their results thoroughly. They conducted fairness analyses showing that their algorithms did not score students differently based on their demographic or family background. These results provide encouraging evidence for NAEP to implement automated scoring in several subjects and further explore the potential of automated scoring."

"We are glad to see our approaches validated on real student data, especially real data from The Nation's Report Card, which shows that there's potential to make a real-world impact," says Lan. "If AI technology can be used in a responsible and controlled way, many students and teachers can potentially benefit from it."

Originally published by the UMass Amherst Office of News and Media Relations.