Mellon Foundation grant awarded to R. Manmatha

Computer Science Research Associate Professor R. Manmatha received a one-year grant from the Andrew W. Mellon Foundation to support the development of software and techniques for scholars in the humanities to use in processing large corpora of digitized books.

The $205,000 grant, Proteus Infrastructure: Work Aggregation and Entity Extraction, is a collaboration between Manmatha, Professor David Smith at Northeastern University (formerly research faculty at UMass Amherst CS), and UMass Amherst Computer Science Professor James Allan.

Specifically, this a pilot project to build and evaluate research infrastructure for scanned books. While there are several large scanned book collections (for example the Internet Archive) much of this is unstructured and not easily used by scholars in the humanities. "The grant will support building the Proteus infrastructure which will help scholars navigate and use such collections more easily," says Manmatha. "Components of the infrastructure include automatically identifying a book's language, linking multiple editions of canonical works, finding quotations in canonical works, and entity detection. One of the key aims of the project is to do all these tasks efficiently at large scale.

Manmatha is also associated with a Mellon Foundation grant awarded in 2012 to Texas A&M.  As part of this grant, OCRing Early Modern Text, UMass will take the output of optical character recognition systems on 18the century English books and use its  technology to automatically estimate OCR errors and correct the output of multiple OCR engines.

More on the Mellon Foundation: http://www.mellon.org/