David ArbourMethods for Enabling Causal Inference in Relational Domains; David Jensen, Advisor; May 2017; Research Scientist, Facebook

Abstract: The analysis of data from complex systems is quickly becoming a fundamental aspect of modern business, government, and science. The field of causal learning is concerned with developing a set of statistical methods that allow practitioners make inferences about unseen interventions. This field has seen significant advances in recent years. However, the vast majority of this work assumes that data instances are independent, whereas many systems are best described in terms of interconnected instances, i.e. relational systems. This discrepancy prevents causal inference techniques from being reliably applied in many real-world settings.

In this thesis, I will present three contributions to the field of causal inference that seek to enable the analysis of relational systems. First, I will present theory for consistently testing statistical dependence in relational domains. I then show how the significance of this test can be measured in practice using a novel bootstrap method for structured domains. Second, I show that statistical dependence in relational domains is inherently asymmetric, implying a simple test of causal direction from observational data. This test requires no assumptions on either the marginal distributions of variables or the functional form of dependence. Third, I describe relational causal adjustment, a procedure to identify the effects of arbitrary interventions from observational relational data via an extension of Pearl's backdoor criterion. A series of evaluations on synthetic domains shows the estimates obtained by relational causal adjustment are close to those obtained from explicit experimentation.

James Atwood; Problems in Graph-Structured Modeling and Learning; Donald Towsley, Advisor; May 2017; Software Engineer, Google

Abstract: This thesis investigates three problems in graph-structured modeling and learning.  We first present a method for efficiently generating large instances from nonlinear preferential attachment models of network structure.  This is followed by a description of search-convolutional neural networks, which extend the notion of convolution to general graph-structured data and offer comparable performance to state-of-the-art relational classification methods at a faster speed.  We conclude with an optimal privacy-protection method for users of online services that remains effective when users have poor knowledge of an adversary's behavior.

Heather Conboy; Automatic Derivation of Requirements for Components Used in Human-Intensive Systems; Lori Clarke, George Avrunin, Advisors; May 2017

Abstract: Human-intensive systems (HISs), which involve coordination among human participants, software applications, and hardware devices to achieve system missions, are increasingly prevalent in safety-critical domains (e.g., healthcare). Such systems are often complex, involving concurrent activities, a variety of human roles requiring extensive domain expertise, and non-nominal (or exceptional) situations that must be recognized and then responded to appropriately. For these systems, it is difficult but important to determine requirements for the software and hardware components that ensure the overall system requirements are satisfied. In this thesis, we investigated employing interface synthesis methods developed for software systems to automatically derive requirements that restrict the interactions between a selected component and the remaining HIS to prevent any violations of the overall system requirements. 

For this work, each HIS is formally modeled as a composition of the selected component models and a process model. The process model describes the recommended ways to achieve the system mission, including how each component will be used. Since the HISs are often complex, we investigated an interface synthesis method that employs a regular language learning algorithm to iteratively refine the current derived requirement based on counterexamples generated by a model checker. Such an iterative refinement strategy tries to reduce the maximum space needed to improve scalability. Because this learning-based interface synthesis method often did not scale well, even with that iterative refinement strategy, we investigated several learning and model checking optimizations. These optimizations improved the performance of the requirement derivation but influenced the counterexamples generated by the model checker, which impacts the permissiveness of the derived requirements. It is crucial that the derived requirements for the components are restrictive enough to prevent system requirement violations but also permissive enough that the effectiveness of the components is not substantially reduced. Thus, we investigated several counterexample generation heuristics to try to improve the permissiveness of the derived requirements by allowing more of the system executions that satisfy the system requirement. Although many of the optimizations and heuristics have been presented previously, this thesis shows how their selective combination affects the requirement derivation results in terms of performance as well as permissiveness. 

For comparison purposes, we also investigated a direct interface synthesis method that conceptually generates the full reachability graph of the system model and then performs one refinement of that reachability graph to produce the derived requirement. This interface synthesis method produces very permissive derived requirements that allow all system executions that prevent violations of the system requirements. The interface synthesis method, however, makes no attempt to reduce the maximum space needed for the requirement derivation. For both the learning-based and direct interface synthesis methods, the derived requirements often reflect the inherent complexity of the HISs and thus it is challenging for the various stakeholders such as component developers, certification agencies, buyers, and users to understand these derived requirements. To improve the understandability of the derived requirements, we explored creating views of these requirements that employ higher-level features such as decomposition to abstract away or highlight certain aspects of the requirements to decrease the cognitive load on the stakeholders.

To evaluate such an automated requirement derivation approach, we developed a requirement deriver toolset that incorporates both a direct and learning-based requirement deriver, the optimizations, heuristics, and views. We applied this toolset to case studies in two important domains: healthcare and election administration. For the direct requirement deriver and the learning-based requirement deriver that employs the different heuristics, we compared the requirement deriver results in terms of performance as well as permissiveness. For each derived requirement, we applied multiple views, whenever possible, to try to improve the understandability of that requirement. This approach produced derived requirements and their views that provided insights about the interactions between the selected components and the processes in which those components will be used. Such derived requirements can be used to modify the components, processes, or both to ensure that the overall system requirements are satisfied. Additionally, after the HISs or their overall system requirements evolve, the approach can be reapplied to help to safely evolve the derived requirements for the components and processes.

Mostafa Dehghan Shirehpaz; Design, Analysis and Optimization of Cache Systems; Donald Towsley, Advisor; May 2017; Software Engineer, Google

Abstract:  The increase in data traffic over the past years is predicted to continue more aggressively in the years to come. However, traditional methods such as increasing the amount of spectrum or deploying more base stations are no longer sufficient to cope with the traffic growth. Caching is hence recognized as one of the most effective means to improve the performance of Internet services by bringing content closer to the end-users. Although the benefits of in-network content caching has been demonstrated in various contexts, they introduce new challenges in terms of modeling and analyzing network performance. Building on analytical results for Time-To-Live caches and the flexibility they provide in modeling caches, this thesis investigates various aspects in which caching affects network design and performance. The complexity of making optimal routing and content placement decisions is studied first. Showing the infeasibility of implementing the optimal strategy, low-complexity techniques are developed to achieve near-optimal performance in terms of the delay observed by end-users. The problem of differentiated cache services is studied next with the question "how can Content Distribution Networks implement caching policies to provide differentiated services to different content providers?". A utility-maximization framework is formulated to design caching policies with fairness considerations and implications on market economy for cache service providers and content publishers. An online algorithm is also developed with the purpose of implementing the utility-based cache policies with no a priori information on the number of contents and file popularities. This thesis also analyzes caches in conjunction with data structures, e.g. Pending Interest Table, proposed in the future Internet architecture designs such as Named Data Networking. The analysis provides the means to understand system performance under different circumstances, and develop techniques to achieve optimal performance.

Shiri Dori-Hacohen; Controversy Analysis and Detection; James Allan, Advisor; September 2017; Founder, AuCoDe

Abstract: Seeking information on a controversial topic is often a complex task. Alerting users about controversial search results can encourage critical literacy, promote healthy civic discourse and counteract the "filter bubble" effect, and therefore would be a useful feature in a search engine or browser extension. Additionally, presenting information to the user about the different stances or sides of the debate can help her navigate the landscape of search results beyond a simple "list of 10 links". This thesis has made strides in the emerging niche of controversy detection and analysis. The body of work in this thesis revolves around two themes: computational models of controversy, and controversies occurring in neighborhoods of topics. Our broad contributions are: (1) Presenting a theoretical frame- work for modeling controversy as contention among populations; (2) Constructing the first automated approach to detecting controversy on the web, using a KNN classifier that maps from the web to similar Wikipedia articles; and (3) Proposing a novel controversy detection in Wikipedia by employing a stacked model using a combination of link structure and similarity. We conclude this work by discussing the challenging technical, societal and ethical implications of this emerging research area and proposing avenues for future work.

Fabricio Murai Ferreira; Applications of Sampling and Estimation on Networks; Donald Towsley, Advisor; September 2016; Assistant Professor, Department of Computer Science, Federal University of Minas Gerais

Abstract: A general trend has permeated the sciences over the last two decades: the research focus has often shifted to studying entities collectively rather than in isolation, giving rise to a field referred as network science. The key to understanding the importance of this field lies in the relationship between structure and function on networks: structure can shape the effectiveness and efficiency of processes that run on top of a network; in turn, these very processes can shape the structure or formation of networks. As a result, a major effort in this field has focused on (i) characterizing and modeling real networks and (ii) studying processes and developing algorithms to enhance/thwart the performance of these processes on networks. In Chapter 1, we study the problem of characterizing directed and undirected networks through multidimensional random walk-based sampling. In Chapter 2, we study the set size distribution estimation problem from an information-theoretic standpoint, which has application to characterizing the in-degree distribution in large graphs. The last chapter focuses on the design of algorithms for networks. More precisely, in Chapter 3, we study the problem of searching networks to find nodes that exhibit a specific trait given a budget by learning a model from node attributes and structural properties, with applications to recruitment on online social networks. We consider a fixed budget scenario with unknown network topology, where recruitment attempts are not free and reveal new parts of the graph.

Lisa Friedland; Detecting Anomalously Similar Entities in Unlabeled Data; David Jensen, Advisor; September 2016; Postdoctoral Research Associate, Northeastern University

Abstract: The goal of this work is to detect closely-linked entities within a data set. The entities of interest have a tie causing them to be similar, such as a shared origin or a channel of influence. Given a collection of people or other entities with their attributes or behavior, we identify unusually similar pairs, and we pose the question: Are these two people linked, or can their similarity be explained by chance?

Computing similarities is a core operation in many domains, but two constraints differentiate this problem. First, the score assigned to a pair should account for the probability of a coincidental match. Second, no training data is provided; we must learn about the system from the unlabeled data and make reasonable assumptions about the linked pairs. This problem has applications to social network analysis, where it can be valuable to identify implicit relationships among people from indicators of coordinated activity. It also arises in situations where we must decide whether two similar observations correspond to two different entities or the same entity observed twice.

This dissertation explores how to assess such ties and, in particular, how the similarity scores should depend on not only the two entities in question but also properties of the entire data set. We develop scoring functions that incorporate both the similarity and rarity of a pair. Then, using these functions, we investigate the statistical power of a data set to reveal (or conceal) such pairs.

In this dissertation, we develop generative models of linked pairs and independent entities and use them to derive scoring functions for pairs in three different domains: people with job histories, Gaussian-distributed points in Euclidean space, and people (or entities) in a bipartite affiliation graph. For the first, we present a case study in fraud detection that highlights the potential, as well as the complexities, of using these methods to address real-world problems. In the latter two domains, we develop an inference framework to estimate whether two entities were more likely generated independently or as a pair. In these settings, we analyze how the scoring function works in terms of similarity and rarity; how well it can detect pairs as a function of the data set; and how it differs from existing similarity functions when applied to real data.

Tian Guo; Elastic Resource Management in Distributed Clouds; Prashant Shenoy, Advisor; September 2016; Assistant Research Professor, Department of Computer Science, Worcester Polytechnic Institute

Abstract: Cloud platform has become increasingly distributed and more applications are being hosted in distributed clouds to take advantage of low network latency. As the cloud-based applications observe both temporal and spatial dynamic workload, it is important to provision appropriate amount of server resources in the closest data center location to satisfy client performance requirement. The nature of dynamic and geo-distributed client workload call for an automatic provisioning mechanism that fully utilizes the underlying distributed clouds. In this thesis, I investigate the challenges of placing and provisioning application servers of different characteristics in distributed data centers in order to improve end-user performances, and argue the need for distributed cloud platform to have provisioning techniques that are aware of application latency requirement as well as geographic workload distribution. I build cloud resource management systems that are seeded with application-specific models to perform geo-aware provisioning. My thesis makes the following contributions: (i) First I investigate how to optimize the location and performance of virtual desktops in distributed clouds. (ii) Next I look at how to handle both temporal and spatial workload fluctuations of virtualized multi-tier application. (iii) I then study how to provision for distributed database clouds that serve as the geo distributed application' backend. (iv) Finally, I propose to work on improving the provisioning efficient by taking the components dependency into account. I conduct experiments using a public distributed cloud platform and a global research network and show that by integrating geo-aware approaches, distributed clouds could improve application performances by up to 90%.

Xin He; Application-aware Resource Management for Cloud Platforms ; Prashant Shenoy, Advisor; September 2016; Senior Engineer, Oracle

Abstract: Cloud computing has become increasingly popular in recent years. The benefits of cloud platforms include ease of application  deployment, a pay-as-you-go model, and the ability to scale resources up or down based on an application's workload. Today's cloud platforms are being used to host increasingly complex distributed and parallel applications. The main premise of this thesis is that application-aware resource management techniques are better suited for distributed cloud applications over a systems-level one-size-fits-all approach.

 In this thesis, I study the cloud-based resource management techniques with a particular emphasis on how application-aware approaches can be used to improve system resource utilization and enhance applications' performance and cost.

I first study always-on interactive applications that run on transient cloud servers such as Amazon spot instances. I show that  by combining techniques like nested virtualization, live migration and lazy restoration together with intelligent bidding strategies, it is feasible to provide high availability to such applications while significantly reducing cost. I next study how to improve performance of parallel data processing applications  like Hadoop and Spark that run in the cloud.  I argue that network I/O contention in Hadoop can impact application throughput and implement a collaborative application-aware network and task scheduler using
software-defined networking.   By combining flow scheduling with task scheduling, our system can effectively avoid network contention and improve Hadoop's performance. I then  investigate similar issues in Spark and find that  task scheduling is more important for Spark jobs. I propose a network-aware task scheduling method that can adaptively schedule tasks for different types of jobs without system tuning and improve Spark's performance significantly.  Finally, I study how to improve performance of  networking functions that run in
virtualized servers. Specifically, I carry out an empirical evaluation of intrusion detection system (IDS) to understand the behavior of such application and use application-aware methods to enhance their performance.

Kaituo Li; Combining Static and Dynamic Analysis for Bug Detection and Program Understanding; Yannis Smaragdakis, Advisor; September 2016; Software Engineer, Amazon

Abstract: This work proposes new combinations of static and dynamic analysis for bug detection and program understanding. There are 3 related but largely independent directions:

  • In the area of dynamic invariant inference, we improve the consistency of dynamically discovered invariants by taking into account second-order constraints that encode knowledge about invariants; the second-order constraints are either supplied by the programmer or vetted by the programmer (among candidate constraints suggested automatically);
  • In the area of testing dataflow (esp. map-reduce) programs, our tool, SEDGE, achieves higher testing coverage by leveraging existing input data and generalizing them using a symbolic reasoning engine (a powerful SMT solver);
  • In the area of bug detection, we identify and present the concept of residual investigation: a dynamic analysis that serves as the runtime agent of a static analysis. Residual investigation identifies with higher certainty whether an error reported by the static analysis is likely true.

Yeon-sup Lim; On Leveraging Multipath Transport Protocol in Mobile Networks; Donald Towsley, Advisor; February 2017; Research Staff Member, IBM Research

Abstract: Multi-Path TCP (MPTCP) is a new transport protocol that enables mobile devices to simultaneously use several physical paths through multiple network interfaces. MPTCP is particularly useful for mobile devices, which usually have multiple wireless interfaces such as IEEE 802.11 (WiFi), cellular (3G/LTE), and Bluetooth. However, applying MPTCP to mobile devices introduces new concerns since they operate in harsh environments with resource constraints due to intermittent path availability and power constraints. In this thesis proposal, I propose research that aims to resolve these problems so as to be able to practically deploy MPTCP in mobile devices. First, I propose a cross-layer path management approach, which exploits information from the physical and medium access control layers, to increase path utilization of MPTCP over lossy links. Our experimental results show that this approach efficiently utilizes an intermittently available WiFi path, with throughput improvements of up to 72%. In addition to the problem of managing intermittent path availability, MPTCP needs to deal with the increased energy consumption due to simultaneous operation of multiple network interfaces. Hence, it is important to understand the energy consumption behavior of MPTCP to determine when it should be deployed on mobile devices. To this end, I develop an energy model for MPTCP power consumption derived from experimental measurements taken using MPTCP on mobile devices. This model is the basis of the design of an improved energy efficient MPTCP (eMPTCP). Our experimental results show that eMPTCP reduces power consumption compared to MPTCP by up to 80% for small file downloads and up to 15% for large file downloads, while preserving the availability and robustness benefits of MPTCP. Finally, having learned from these studies of path utilization and energy model of MPTCP, I will explore how video streaming, which is one of the most popular applications in mobile devices, should interact with MPTCP to achieve better user experience in terms of streaming quality and energy.

Chang Liu; Inference in Networking Systems with Designed Measurements; Donald Towsley, Advisor; February 2017; Software Development Engineer, Amazon Web Service

Abstract: Networking systems consisted of network infrastructures and the end-hosts has been essential in supporting our daily communication, delivering huge amount of content and large number of services, and providing large scale distributed computing. To monitor and optimize the performance of such networking systems, or to provide flexible functionalities for the applications running on top of the systems, it is important to know the internal metrics of the networking systems such as link loss rates or path delays. But the internal metrics are often not directly available due to the scale and complexity of the networking systems. This motivated the techniques of inference on internal metrics through available measurements. 

In this thesis, I investigate inference methods on networking systems from multiple aspects. In the context of mapping users to servers in content delivery networks, we show that letting user select a server that provides good performance from a set of servers that are randomly allocated to the user can lead to optimal server allocation, of which a key element is to infer the work load on the servers using the performance feedback. For network tomography, where the objective is to estimate link metrics (loss rate, delay, etc.) using end-to-end measurements, we show that the information of each end-to-end measurement can be quantized by fisher information and the estimation error of link metrics can be efficiently reduced if the allocation of measurements on paths is designed to maximize the overall information. Last but not least, in the context of finding the most robust path for routing from a source to a destination in a network while minimizing the cost of exploring lossy paths, the trade-off between exploiting the best paths that is estimated and taking the risk to explore worse paths for more information is investigated, and adaptive learning methods is developed. The performance of the developed techniques are evaluated with simulations.

Dirk Ruiken; Belief-Space Planning for Resourceful Manipulation and Mobility; Roderic Grupen, Advisor; May 2017; Industry

Abstract: Robots are increasingly expected to work in partially observable and unstructured environments. They need to select actions that exploit perceptual and motor resourcefulness to manage uncertainty based on the demands of the task and environment. The research in this dissertation makes two primary contributions. First, it develops a new concept in resourceful robot platforms called the UMass uBot and introduces the sixth and seventh in the uBot series. uBot-6 introduces multiple postural configurations that enable different modes of mobility and manipulation to meet the needs of a wide variety of tasks and environmental constraints. uBot-7 extends this with the use of series elastic actuators (SEAs) to improve manipulation capabilities and support safer operation around humans.

The resourcefulness of these robots is complemented with a belief-space planning framework that enables task-driven action selection in the context of the partially observable environment. The framework uses a compact but expressive state representation based on object models. We extend an existing affordance-based object model, called an aspect transition graph (ATG), with geometric information. This enables object-centric modeling of features and actions, making the model much more expressive without increasing the complexity. A novel task representation enables the belief-space planner to perform general object-centric tasks ranging from recognition to manipulation of objects. The approach supports the efficient handling of multi-object scenes. 

The combination of the physical platform and the planning framework are evaluated in two novel, challenging, partially observable planning domains. The ARcube mobile manipulation domain provides a large population of objects that are highly ambiguous. Objects can only be differentiated using multi-modal sensor information and manual interactions. In the dexterous mobility domain, a robot can employ multiple mobility modes to complete navigation tasks under a variety of possible environment constraints. The performance of the proposed approach is evaluated using experiments in simulation and on a real robot.

Seung Yeob Shin; Specification and Analysis of Resource Utilization Policies for Human-Intensive Systems;  Leon J. Osterweil, Yuriy Brun, Advisors; September 2016; Research Associate, University of Luxembourg

Abstract: Contemporary systems often require the effective support of many types of resources, each governed by complex utilization policies. Sound management of these resources plays a key role in assuring that these systems achieve their key goals. To help system developers make sound resource management decisions, I provide a resource utilization policy specification and analysis framework for (1) specifying very diverse kinds of resources and their potentially complex resource utilization policies, (2) dynamically evaluating the policies' effects on the outcomes achieved by systems utilizing the resources, and (3) formally verifying various kinds of properties of these systems.

Resource utilization policies range from simple, e.g., first-in-first-out, to extremely complex, responding to changes in system environment, state, and stimuli. Further, policies may at times conflict with each other, requiring conflict resolution strategies that add extra complexity. Prior specification approaches rely on relatively simple resource models that prevent the specification of complex utilization and conflict resolution policies. My approach (1) separates resource utilization policy concerns from resource characteristic and request specifications, (2) creates an expressive specification notation for constraint policies, and (3) creates a resource constraint conflict resolution capability. My approach enable creating specifications of policies that are sufficiently precise and detailed to support static and dynamic analyses of how these policies affect the properties of systems constrained or governed by these policies.

I provide a process- and resource-aware discrete-event simulator for simulating system executions that adhere to policies of resource utilization. The simulator integrates the existing JSim simulation engine with a separate resource management system. The separate architectural component makes it easy to keep track of resource utilization traces during a simulation run. My simulation framework facilitates considerable flexibility in the evaluation of diverse resource management decisions and powerful dynamic analyses.

Dynamic verification through simulation is inherently limited because of the impossibility of exhaustive simulation of all scenarios. I complement this approach with static verification. Prior static resource analysis has supported the verification only of relatively simple resource utilization policies. My research utilizes powerful model checking techniques, building on the existing FLAVERS model checking tool, to verify properties of complex systems that are also verified to conform to complex resource utilization policies. My research demonstrates how to use systems such as FLAVERS to verify adherence to complex resource utilization policies as well as overall system properties, such as the absence of resource leak and resource deadlock.

I evaluated my approach working with a hospital emergency department domain expert, using detailed, expert-developed models of the processes and resource utilization policies of an emergency department. In doing this, my research demonstrates how my framework can be effective in guiding the domain expert towards making sound decisions about policies for the management of hospital resources, while also providing rigorously-based assurances that the guidance is reliable and well-founded.

My research makes the following contributions: (1) a specification language for resources and resource utilization policies for human-intensive systems, (2) a process- and resourceaware discrete-event simulation engine that creates simulations that adhere to the resource utilization policies, allowing for the dynamic evaluation of resource utilization policies, (3) a process- and resource-aware model checking technique that formally verifies system properties and adherence to resource utilization policies, and (4) validated and verified specifications of an emergency department healthcare system, demonstrating the utility of my approach.

Xiaojian Wu; Stochastic Network Design: Models and Scalable Algorithms; Shlomo Zilberstein, Dan Sheldon, Advisors; September 2016; Post-Doctoral Research Associate, Cornell University

Abstract: Many natural and social phenomena occur in networks. Examples include the spread of animals in habitat networks, the flow of traffic in road networks and information spread over social networks.  Management actions can be taken to modify the underlying networks to reshape the behaviors of phenomena towards benefiting the society, commerce, and the environment. For example, decision makers may want to restore the connectivity of river networks to facilitate fish to access their historical habitats by removing river barriers such as dams and culverts. To identify a small set of cost-efficient actions from a large number of candidate actions becomes computationally challenging. For example, it is computationally hard to find the best set of barriers to remove under a budget constraint to optimize the connectivity of a river network.

In my thesis, I consider a wide range of such network optimization problems and provide several fast algorithms to find high-quality solutions. First, I will define a general problem framework called stochastic network design to formulate these problems as a type of stochastic optimization problems. Then, I will describe fast algorithms to solve problems within the framework under three different settings that become gradually more general. I will show how the problem complexity and algorithms change under these settings. At last, I will show the empirical performance of these algorithms on three real-world applications in the areas of computational sustainability and emergency response.

Haopeng Zhang; High-Performance Complex Event Processing for Decision Analytics; Yanlei Diao, Advisor; May 2017; Software Engineer, Google, Inc.

Abstract: Complex Event Processing (CEP) systems are becoming increasingly popular in domains for decision analytics such as financial services, transportation, cluster monitoring, supply chain management, business process management, and health care. These systems collect or create high volumes of events, which form an event stream and the stream often needs to be processed in real-time.  CEP queries are applied for filtering, correlation, aggregation, and transformation, to derive high-level, actionable information. Tasks for CEP systems fall into two categories: passive monitoring and proactive monitoring. For passive monitoring, users know their exact needs and express them in CEP query languages, then CEP engines evaluate them against incoming data events; for proactive monitoring, users cannot tell exactly what they are looking for and need to work with CEP engines to figure out the query. In my thesis, there are contributions for both categories.

For the passive monitoring, the first problem we solve is to apply CEP queries over streams with imprecise timestamps, which is infeasible before this work. Existing CEP systems assume that the occurrence time of each event is known precisely, however we observe that event occurrence times are often unknown or imprecise.  Therefore, we propose a temporal model that assigns a time interval to each event to represent all of its possible occurrence times, two evaluation frameworks, and optimizations in these frameworks. Our new approach achieves high efficiency for a wide range of workloads tested using both both real traces and synthetic datasets. This contribution enables CEP techniques applicable for more application scenarios.

Another contribution for the passive monitoring is that we improve the evaluation performance significantly for expensive queries in CEP. Those expensive queries involve Kleene closure patterns, flexible event selection strategies, and events with imprecise timestamps. It is infeasible to evaluate  these useful yet complex queries using existing queries due to performance bottleneck. We develop a series of optimizations after analyzing the complexity of these pattern queries. Microbenchmark results show superior performance of our system for expensive pattern queries while most state-of-the-art systems suffer from poor performance. A thorough case study on Hadoop cluster monitoring further demonstrates the efficiency and effectiveness of our proposed techniques.

The last problem we are solving in my thesis is about proactive monitoring. We start from explaining anomalies in results of CEP queries. When users find anomalies from the passive monitoring results, existing CEP systems are unable to provide any explanation. Our new system compares context events for user annotated anomalies with context for normal status, and find the most differentiating features as the potential explanations for those anomalies. A series of optimizations are also proposed to improve the effectiveness and efficiency. Finally, the generated explanations can be translated into CEP queries for proactively recognizing future anomalies.