Latest CS research: from mining books to solving crimes

The department's faculty are embarking on many exciting new projects. A few are highlighted below.

Mining a Million Scanned Books - In this NSF-funded project, "Mining a Million Scanned Books: Linguistic and Structure Analysis, Fast Expanded Search, and Improved Optical Character Recognition (OCR)," the UMass Amherst Center for Intelligent Information Retrieval (CIIR), the Perseus Digital Library Project at Tufts University, and the Internet Archive are investigating large-scale information extraction and retrieval technologies for digitized book collections. This research is being carried out using a collection of over a million scanned books that includes 8.5 terabytes of text and half a petabyte of scanned images. Professors James Allan, R. Manmatha, and David Smith are leading the research within the CIIR to develop new approaches to processing the large collection.

Pacemaker Interference - Assistant Professor Kevin Fu and graduate student Ben Ransford, along with researchers from the University of Washington and the Department of Medicine at Beth Israel Deaconess Medical Center, warn about adverse magnetic interactions between headphones and pacemakers in a study published in October in the HeartRhythm Journal, the leading specialty journal in cardiology with the largest circulation and readership base. Their research showed that there is clinically significant magnetic interference of implanted cardiac devices by portable headphones. Patients with these devices are now advised to keep the headphones away from their device to avoid magnetic interference.

Advances in OCR - Assistant Professor Erik Learned-Miller (PI) and Professor Andrew McCallum received an NSF grant for their project "Coordinating Language Modeling, Computer Vision, and Machine Learning for Dramatic Advances in Optical Character Recognition." In this project, they are investigating "iterative contextual modeling," an approach to OCR in which high confidence recognitions of easier document portions are used to help in developing document specific models. These models can be related to appearance--for example a sample of correct words can be used to develop a model for the font in a particular document. In addition, the models can be based on language and vocabulary information.

Human Emotion Sensors - Research Professor Beverly Woolf continues her research on sensors that detect human emotion. Woolf and her research team have demonstrated that intelligent tutoring systems can provide adaptive feedback based on an individual student's affective state. Their primary research goal is to identify whether a dependency exists between students' reported emotions and their learning, motivation, and attitudes toward a subject. The sensors are placed on a student's chair, mouse, monitor, and wrist to provide data about posture, movement, grip tension, facially expressed mental states and arousal. Woolf's researchers recently tested the system with 600 students in Deerfield, MA to confirm that their sensors can predict student emotion (frustrated, bored, etc.) with up to 80% accuracy in comparison with the student's statement of his/her emotion.

Molecular Playground - The molecular aspects of nature are too often viewed as inaccessible and uninteresting to the general public. While the public can appreciate the beauty of a flower or a swan, the molecular basis of these organisms goes unnoticed. Professor Emeritus Allen Hanson and colleagues from chemistry, microbiology, and computer science are working on a project to get the organisms noticed. They are developing a system for displaying large-scale interactive molecules in prominent public spaces. The first such system has been installed in the new Integrated Sciences Building on campus. The aim is to capture the public's attention and to prod individuals to explore personally a vast array of molecular structures in a human-size "molecular playground." The local Playground installation consists of a projector, an infra-red (IR) illuminator, and a camera fitted with a filter that blocks visible light, but passes IR. In this way, the camera "sees" the person playing with the image, but does not see the projected image itself. The camera tracks movement and the software then decides what to use as a trigger to tell the system to stop the pre-programmed animation and instead deliver "rotate" commands to the system.

Forensic Analysis - Associate Professor Brian Levine (PI) is working on an NSF-funded project to significantly advance forensic analysis for crimes involving mobile systems. While current methods and legislation focus heavily on logical identifiers, Levine and UMass Amherst Electrical & Computer Engineering Professor Dennis Goeckel (co-PI) are designing, evaluating, and deploying new forensic techniques that focus on consistent and trackable characteristics of mobile computing. Additionally, their work plays an important role in understanding the limits of personal privacy in these settings. They are developing new radio fingerprinting techniques that detect identifying information present in a radio's low-level components, and the team is developing novel techniques of traffic analysis that determine the source of encrypted Web traffic. Their research will directly assist law enforcement that investigate network trafficking of images of child sexual exploitation, demonstrating the usability of trustworthy computing.