Recent Ph.D. graduates (AY 2012 - 2013)

Recent Computer Science Ph.D. graduates (AY 2012-2013)

Michael Bendersky; Information Retrieval with Query Hypergraphs; (W. Bruce Croft, Advisor); Sept. 2012; Software Engineer, Google, Inc.

In this thesis we focus on verbose natural language search queries. To this end, we propose an expressive query representation based on query hypergraphs. Unlike the existing query representations, query hypergraphs model the dependencies between arbitrary concepts in the query, rather than dependencies between single query terms.  Query hypergraphs are parameterized by importance weights, which are assigned based on their contribution to the retrieval effectiveness. 

Query hypergraphs are not limited to modeling the explicit query, and we develop two methods for query expansion using query hypergraphs. In these methods, the expansion concepts may come either from the retrieval corpus or from a combination of external information sources. We empirically demonstrate that query hypergraphs are significantly more effective than many of the current state-of-the-art retrieval methods. Query hypergraphs improve the retrieval performance for all query types, and, in particular, they exhibit the highest effectiveness gains for verbose queries.

Toby Dragon; The Impact of Integrated Coaching and Collaboration Within an Inquiry Learning Environment; (Beverly Woolf, Advisor); May 2013; Assistant Professor, Computer Science, Ithaca College

This thesis explores the design and evaluation of a collaborative, inquiry learning Intelligent Tutoring System for ill-defined problem spaces. The common ground in the fields of Artificial Intelligence in Education and Computer-Supported Collaborative Learning is investigated to identify ways in which tutoring systems can employ both automated coaching and collaborative techniques to support students as they learn. The resulting system, Rashi, offers feedback on student work by using an Expert Knowledge Base to recognize students' work. Evaluation in actual classrooms demonstrated that collaboration significantly improves students' contributions, and some evidence suggests that there is a significant positive correlation between the amount of coaching received and metrics that represent positive inquiry behavior. Finally, this thesis highlights the potential for combining coaching and collaboration such that 1) collaborative work can create more opportunity to provide automated coaching and 2) automated coaching can identify key moments when collaboration should be encouraged.

Mark Floryan; Evolving Expert Knowledge Bases: Applications of Crowdsourcing and Serious Gaming to Advance Knowledge Development for Intelligent Tutoring Systems; (Beverly Woolf, Advisor); May 2013; Visiting Lecturer, University of Virginia.

This dissertation presents a novel effort to develop intelligent tutoring system (ITS) technologies that adapt by observing student behavior. We define an evolving expert knowledge base (EEKB) that structures a domain's information and evolves its state over time. An algorithm observes students as they work within our ITS Rashi, and coalesces contributions to form this EEKB. We discover that EEKB models can be constructed accurately, and with significant efficiency compared to human-constructed models. We also examine the impact that game mechanics have on this process. Students who are given additional game mechanics contribute higher amounts of data, while performing higher-quality work. Additionally, we define knowledge-refinement games (KRGs), which motivate subject matter experts (SMEs) to refine an EEKB. Our experiments provide evidence that both the quality and breadth of knowledge within the EEKB are increased when experts use the KRG.

Naomi Fox; Accurate and Robust Mechanical Modeling of Proteins; (Ileana Streinu, Advisor); Feb. 2013; Postdoctoral Fellow, Lawrence Berkeley National Laboratory

The focus of this thesis is improving accuracy and robustness of computational protein rigidity analysis systems. One contribution is in new approaches to mechanical modeling of non-covalent interactions, namely hydrogen bonds and hydrophobic interactions. Unlike covalent bonds, the behavior of these interactions varies with their energies. I investigate energy-refined modeling of these interactions. Included is a method to assign a score to a predicted cluster decomposition, adapted from the B-cubed score from information retrieval. Another contribution is in new approaches to measuring the robustness of rigidity analysis results. The protein's fold is held in place by weak, non-covalent interactions, known to break and form during natural fluctuations. I propose an approach to measure the robustness of rigidity results, by studying how detrimental the loss of a single interaction may be to a cluster's rigidity. The study shows that, when present, highly critical interactions are concentrated around the active site, indicating that nature has designed a very versatile system for transitioning between unique conformations.

Filip Jagodzinski; Towards Large Scale Validation of Protein Flexibility Using Rigidity Analysis; (Ileana Streinu, Advisor); Sept. 2012; Assistant Professor, Dept. of Computer Science, Central Washington Univ.

Proteins flex and bend to perform their functions. At the atomic level, their motions cannot be observed. Rigidity analysis is a graph-based technique that infers the flexibility of molecules. Due to the lack of convenient tools for curating protein data, the usefulness of rigidity analysis in inferring biophysical properties has been demonstrated on only a handful of molecules. Conversely there is no agreed-upon choice of modeling of important stabilizing interactions.

We make progress towards large-scale validation of protein flexibility using rigidity analysis. We develop the KINARI software that permits automated curation of protein data. Rigidity analysis of protein biological assemblies generated by KINARI provides information that would be missed if only the unprocessed data were analyzed. We develop KINARI-Mutagen, which permits evaluation of the effects of mutations. Finally, we systematically vary the modeling of inter-atomic interactions and measure how rigidity parameters correlate with experimental data.

[Note:  Special thanks to Filip for all of his efforts as a Significant Bits graduate student liaison for the past six years.]

Jin Young Kim; Retrieval and Evaluation Techniques for Personal Information; (W. Bruce Croft, Advisor); Sept. 2012; Applied Researcher, Microsoft Bing

Providing an effective mechanism for personal information retrieval is important for many applications, and requires different techniques than have been developed for general web search. This thesis focuses on developing retrieval models and representations for personal search, and on designing evaluation frameworks that can be used to demonstrate retrieval effectiveness in a personal environment.

From the retrieval model perspective, personal information can be viewed as a collection of multiple document types each of which has unique metadata. Based on this perspective, we propose a retrieval model that exploits document metadata and multi-type structure. Proposed retrieval models were found to be effective in other structured document collections, such as movies and job descriptions.

Evaluating these methods is particularly challenging for personal information due to privacy issues. This thesis introduces a set of techniques that enables realistic and repeatable evaluation of techniques for personal information retrieval. In particular, we describe techniques for simulating test collections and show that game-based user studies can collect more realistic usage data with relatively small cost.

Akshat Kumar; Exploiting Domain Structure in Multiagent Decision Theoretic Planning and Reasoning; (Shlomo Zilberstein, Advisor); May 2013; Research Scientist, IBM Research, India

This thesis focuses on decision-theoretic reasoning and planning problems that arise when a group of collaborative agents are tasked to achieve a goal that requires collective effort. We examine these decision-theoretic problems within the framework of distributed constraint optimization problems (DCOPs) and decentralized partially observable MDPs (Dec-POMDPs). For DCOPs, a new variational formulation is developed that provides tighter approximation and better quality solutions than previous approaches. For planning under the Dec-POMDP framework, a number of algorithms based on machine learning and mathematical optimization are developed. The main contribution of this thesis is the development of effective, scalable and quality-bounded computational approaches for multiagent planning under uncertainty. This is achieved by a synthesis of techniques from multiple areas of artificial intelligence, machine learning and operations research. Empirically, each algorithmic contribution has been tested rigorously on common benchmark problems and, in many cases, real-world applications from machine learning and operations research literature.

Scott Kuindersma; Variable Risk Policy Search for Dynamic Robot Control; (Roderic Grupen and Andrew Barto, Advisors); Sept. 2012; Postdoctoral Associate, MIT Computer Science and Artificial Intelligence Laboratory

In this thesis, I present efficient global and local risk-sensitive stochastic optimization algorithms suitable for performing policy search in variety of problems of interest to robotics researchers. These algorithms exploit new techniques in nonparameteric heteroscedastic regression to directly model the policy dependent distribution of cost. For local search, learned cost models can be used as critics for performing risk-sensitive gradient descent. Alternatively, decision-theoretic criteria can be applied to globally select policies to balance exploration and exploitation in a principled way, or to perform greedy minimization with respect to risk-sensitive criteria. This separation of learning and policy selection leads to variable risk control, where risk sensitivity can be flexibly adjusted and appropriate policies can be selected at runtime without requiring additional policy executions. I describe several experiments with the uBot-5 including learning dynamic arm motions to stabilize after large impacts, lifting heavy objects while balancing, and developing safe fall bracing behaviors.

Yariv Levy; Multiscale Modeling of Human Addiction: a Computational Hypothesis for Allostasis and Healing; (Andrew Barto and Jerrold Meyer, Advisors); Feb. 2013

I presented a computational multiscale framework for predicting behavioral tendencies related to human addiction. My first contribution is a formal, heuristic, and exploratory framework to conduct interdisciplinary investigations about the neuropsychological, cognitive, behavioral, and recovery constituents of addiction. My second contribution proposes a computational framework to account for real-life recoveries that are not dependent on pharmaceutical, clinical, and counseling support. My third contribution introduces a computational hypothesis about the allostatic theory of addiction and indicates how mechanisms within the brain's reward system may stabilize the functional state of an addict, and how mechanisms within the endocrine system may act as source for possible recoveries. The formal arguments presented in my dissertation are illustrated by simulations which delineate archetypal patterns of human behavior toward drug consumption: escalation of use and influence of conventional and alternative rehabilitation treatments. Results obtained from this computational framework encourage an integrative approach to drug rehabilitation therapies.

Andres Molina-Markham; Privacy-Aware Collaboration Among Untrusted Resource Constrained Devices; (Kevin Fu, Advisor); Feb. 2013; Postdoctoral Researcher, Dartmouth College

Individuals are increasingly encouraged to share private information with service providers. Privacy is relaxed to increase the utility of the data for the provider. This dissertation offers an alternative approach in which raw data stay with individuals and only coarse aggregates are sent to analysts. A challenge is the reliance on constrained devices for data collection. This dissertation demonstrates the practicality of this approach by designing and implementing privacy-aware systems that collect information using low-cost or ultra-low-power microcontrollers. Smart meters can generate certified readings suitable for use in a privacy-preserving system every 10s using a Texas Instruments MSP430 microcontroller. CRFIDs--batteryless devices that operate on harvested energy from radio frequency--can generate encrypted sub-aggregates in 17s to contribute to a privacy-preserving aggregation system that does not rely on a trusted aggregator. A secure communication channel for CRFID tags via untrusted relays achieves a throughput of 18Kbps.

Benjamin Ransford; Transiently Powered Computers; (Kevin Fu, Advisor); May 2013; Postdoctoral Fellow, Computer Science & Engineering, University of Washington

My research aims to make energy harvesting a practical way to power small computing systems, and my thesis explores software and hardware techniques to enable this goal. The main challenge for energy-harvesting computers is to compute and communicate opportunistically, completing tasks on small bursts of energy punctuated by power loss. My dissertation presents two systems that respectively insulate computations from power loss and safely outsource storage to more-capable devices. It also demonstrates that a tiny radio-powered computer can endow a safety-critical medical device with improved security properties, and that replacing a medical device's traditional active radio transmitter with a radio-powered computer results in significant energy savings.

Matthew Rattigan; Leveraging Relational Representations for Causal Discovery; (David Jensen, Advisor); Sept. 2012; Digital Analyst, Obama For America

This thesis represents a synthesis of relational learning and causal discovery, two subjects at the frontier of machine learning research.  Relational learning investigates algorithms for constructing statistical models of data drawn from multiple types of interrelated entities, and causal discovery investigates algorithms for constructing causal models from observational data. 

Traditionally, propositional (or "flat") data representations have dominated the statistical sciences.  These representations assume that data consist of independent and identically distributed (iid) entities which can be represented by a single data table.  More recently, data scientists have increasingly focused on "relational" data sets that consist of interrelated, heterogeneous entities.  However, relational learning and causal discovery are rarely combined. 

This unexplored topical intersection represents an opportunity for advancement, in which we can provide insight into the challenges found in each subject area.  By adopting a causal viewpoint, we can clarify the mechanisms that produce previously identified pathologies in relational learning.  Analogously, we can utilize relational data to establish and strengthen causal claims in ways that are impossible using only propositional representations.

Elisha Rosensweig; On the Analysis and Management of Cache Networks; (James Kurose, Advisor); Sept. 2012; Software Developer, CloudBand, Alcatel-Lucent

Over the past few years Information-Centric Networking, a networking architecture in which host-to-content communication protocols are introduced, has been gaining much attention. A central component of such an architecture is a large-scale interconnected caching system. To date, the modeling of these cache networks, as well as understanding of how they should be managed, are both in their infancy. This dissertation sets out to consider both of these challenges. We consider approximate and bounding analysis of cache network performance, the convergence of such systems to steady-state, and the manner in which content should be searched for in a cache network. Taken as a whole, the work presented here constitutes an array of fundamental tools for addressing the challenges posed by this new and exciting field.

Mastooreh (Negin) Salajegheh; Software Techniques to Reduce the Energy Consumption of Low-Power Devices at the Limits of Digital Abstractions; (Kevin Fu, Advisor); Feb. 2013; Postdoctoral Research Associate, Computer Science, University of Virginia

My thesis explores software techniques that bend digital abstractions in order to allow embedded systems to do more computation with less energy. The capabilities and size of the embedded systems continue to improve dramatically; however, improvements in battery density and energy harvesting have failed to mimic Moore's law. Thus, energy remains a formidable bottleneck for low-power embedded systems. My research considers three methods that unleash energy otherwise squandered on communication, storage, and time keeping: 1) CCCP, which provides an energy-efficient storage alternative to local non-volatile storage by relying on cryptographic backscatter radio communication, 2) Half-Wits, which reduces energy consumption by 30% by allowing operation of embedded systems at below-spec supply voltages and implementing NOR flash memory error recovery in firmware rather than strictly in hardware, 3) TARDIS, which exploits the decay properties of SRAM to estimate the duration of a power failure ranging from seconds to several hours depending on hardware parameters.

Shiraj Sen; Bridging the Gap between Autonomous Skill Learning and Task Specific Planning; (Roderic Grupen, Advisor); Feb. 2013; Postdoctoral Research Associate, School of Computer Science, University of Massachusetts Amherst

Skill acquisition and task specific planning are essential components of any robot system, yet they have long been studied in isolation. This, I contend, is due to the lack of a common representational framework. I present a holistic approach to planning robot behavior, using previously acquired skills to represent control knowledge (and objects) directly, and to use this background knowledge to build plans in the space of control actions. Actions in this framework are closed-loop controllers constructed from combinations of sensors, effectors, and potential functions. I will show how robots can use reinforcement learning techniques to acquire sensorimotor programs. The agent then builds a functional model of its interactions with the world as distributions over the acquired skills. In addition, I present two planning algorithms that can reason about a task using the functional models. These algorithms are then applied to a variety of tasks such as object recognition and object manipulation to achieve the objective on two different robot platforms.

Navin Sharma; Designing Distributed Systems for Intermittent Power; (Prashant Shenoy, Advisor); May 2013; Postdoctoral Research Associate, School of Computer Science, University of Massachusetts Amherst

The increasing demand for computing infrastructure, such as data centers and storage systems, has increased their energy footprint. As a result of this growth, computing infrastructure today contribute 2-3% of the global carbon emissions. To reduce the financial and environmental impact of growing energy demands, the design of eco-friendly green infrastructure has become an important societal need. This thesis focuses on designing distributed systems, primarily data centers and storage systems, to run on renewable energy sources such as solar and wind.

Upendra Sharma; Elastic Resource Management in Cloud Computing Platforms; (Prashant Shenoy, Advisor); May 2013; Research Staff Member, IBM T. J. Watson Research Center

Enterprise applications are known to observe dynamic workload; provisioning correct capacity for these applications remains an important and challenging problem. Predicting fluctuations in workload or the peak workload is a difficult challenge; erroneous predictions often lead to under-utilized systems or in some situations cause temporary outage of an otherwise well provisioned web-site. Consequently, rather than provisioning server capacity to handle infrequent peak workloads, an alternate approach of dynamically provisioning capacity in response to workload fluctuations has become popular. Cloud platforms are particularly suited for such applications due to their ability to dynamically provision capacity while charge on pay-per-use basis. Consumers enjoy the benefits of resource elasticity but face the challenge of efficient resource management. Administrators of cloud platforms, on the other hand, face the challenge of efficient management/deployment of component-services to support seamless application elasticity. In my thesis I describe solutions to these problems by designing/implementing intelligent systems.

Rahul Singh; Resource Management for Enterprise Data Center Applications; (Prashant Shenoy, Advisor); May 2013; Research Staff Member, IBM T. J. Watson Research Center

Today's enterprises depend heavily on their information technology (IT) infrastructure for performing a variety of tasks. The IT infrastructure of a large enterprise comprises of a large number of software applications that utilize various resources including both software and hardware resources. The massive scale of these enterprise applications as well the complicated interactions between their various resources, makes resources management of such enterprise applications extremely challenging. Also, there are dynamics at various time scales that need to be managed effectively. In this thesis we investigate various challenges in resource management of such large-scale distributed enterprise applications and develop systems that solve some of these management tasks arising at multiple time scales. The systems developed in this thesis solve problems arising at the scale of years like IT transformation and "what-if" analysis as well as problems arising at the scale of seconds like dynamic resource provisioning and handling transiency.

Hamed Soroush; Measurement-Driven Characterization of the Mobile Environment; (Brian Levine, Advisor); May 2013; Visiting Lecturer, Department of Computer Science, University of Virginia

This dissertation proposes new tools and techniques for the characterization of the environment impact on mobile networks. Novel mechanisms for improving connectivity and throughput in highly mobile scenarios are presented and new privacy challenges for mobile users are demonstrated. This work includes development and evaluation of large-scale mobile experimentation infrastructures that facilitate longitudinal studies of today's technologically diverse mobile environment. Based on these studies, a mechanism called Spider is presented that efficiently utilizes Wi-Fi deployments in highly mobile scenarios, achieving a 400% improvement in throughput and 54% improvement in connectivity over stock Wi-Fi implementations. Analyzing hundreds of traces gathered over a large geographic area shows that patterns of data transmission between a server on the Internet and a moving cell phone can reveal the geographic travel path of that phone. The presented technique achieves 94.7% accuracy in distinguishing mobile phones from stationary phones. Routes taken by each mobile phone could be distinguished with up to 75.9% accuracy.

Thanh Tran; High-Performance Processing of Continuous Uncertain Data; (Yanlei Diao, Advisor); May 2013; Data Scientist, Twitter

Uncertain data has arisen in many sensing and scientific applications. The raw data in these applications is often incomplete, imprecise, and misleading, which has two implications: (i) the raw data is not directly queriable, and (ii) feeding the uncertain data into existing systems produces results of unknown quality. My thesis presents a system for uncertain data processing with two main functionalities, (i) capturing and transforming raw noisy data to queriable tuples with quantified uncertainty, and (ii) performing query processing on such tuples while characterizing the result uncertainty. I propose a probabilistic modeling and inference approach for data capture and transformation, and demonstrate it for RFID data. I then examine query processing for continuous uncertain data by presenting new data models and processing algorithms to compute query answers, either exact or approximate with bounded errors. The experimental results show that the proposed techniques outperform Monte Carlo sampling for many important workloads.

Edward Walters, Generating Simulators from Multi-level Coordinated Specifications; (J. Eliot B. Moss, Advisor); May 2013; Senior Software Systems Engineer, MITRE Corporation

Computer system simulations are one of the hardware and software designer's most useful and pervasive tools, but they can be difficult and time consuming to adapt to new requirements and design considerations. This dissertation presents a methodology and toolset for describing and generating a new type of mixed-category simulators we call blended simulators. Blended simulators have two novel features: one specifies them by using multiple semantically related specification languages, each of which describes a computer system at a different level of detail; and we automatically generate the resulting simulators from the set of specifications, using a sophisticated search process to blend behavioral, structural, and timing information at a fine-grained level. This results in a simulator that accurately captures timing metrics, but relies on functional simulation constructs when most appropriate. Other contributions include the CASL language for specifying modern micro-architectures within this system, and CSIP: our methodology for effectively describing the interplay between static and dynamic elements within a micro-architecture.

Xiaobing Xue; Modeling Reformulation as Query Distributions; (W. Bruce Croft, Advisor); Sept. 2012; Software Developer, Twitter

Query reformulation modifies the original query with aim of providing a better representation of a user's information need and consequently improving the retrieval performance. Previous reformulation models typically generate words and phrases related to the original query, but do not consider how these words and phrases would fit together in realistic or actual queries.

A novel framework is proposed that models reformulation as a distribution of reformulated queries, where each reformulated query is associated with a probability indicating its importance. This framework considers a reformulated query as the basic unit and can capture the important query-level dependencies between words and phrases in a realistic or actual query. Since a reformulated query is the output of applying a single or multiple reformulation operations, this framework combines different operations such as query segmentation, query substitution and query deletion within the same framework. Moreover, a retrieval model is considered as an integrated part of this framework, which considers the reformulation model and the retrieval model jointly.

Tingxin Yan; Designing Novel Mobile Systems By Exploiting Sensing, User Context, and Crowdsourcing; (Deepak Ganesan, Advisor); Sept. 2012; Assistant Professor, Department of Computer Science & Engineering, Univ. of Arkansas

We focus first on the domain of context-aware mobile systems. We study the problem of how to incorporate user context into mobile operating system design by presenting a system named FALCON--an application-preloading engine, which infers user context from sensing data, learns associations between user context and application usage, and preloads applications to improve their responsiveness. Compared with existing caching schemes, Falcon improves the application responsiveness by 2 times.

The second focus is on the domain of participatory sensing. We explore the problem of improving image search accuracy by presenting a mobile service named CrowdSearch that achieves over 95% accuracy consistently across multiple categories of images with response time in a minute.

We then study the problem of image search under resource constraints, by presenting a mobile system named SenSearch that turns smartphones into micro image search engines, where images are collected, indexed, and transmitted using compact features that are two magnitudes smaller than their raw format. SenSearch improves the energy and bandwidth cost by five times compared with existing image search engines.