High-Performance Complex Event Processing for Decision Analytics

13 May
Tuesday, 05/13/2014 6:00am to 8:00am
Ph.D. Dissertation Proposal Defense

Haopeng Zhang

Computer Science Building, Room 142

Complex Event Processing (CEP) systems are becoming increasingly popular in domains for decision analytics such as financial services, transportation, cluster monitoring, supply chain management, business process management, and health care. These systems collect or create high volumes of events, which form an event stream and the stream often needs to be processed in real-time.  CEP queries are applied for filtering, correlation, aggregation, and transformation, to derive high-level, actionable information. Tasks for CEP systems fall into two categories: passive monitoring and proactive monitoring. For passive monitoring, users know their exact needs and express them in CEP query languages, then CEP engines evaluate them against incoming data events; for proactive monitoring, users cannot tell exactly what they are looking for and need to work with CEP engines to figure out the query. In my thesis, there are contributions for both categories.

For the passive monitoring, the first problem we solve is to apply CEP queries over streams with imprecise timestamps, which is infeasible before this work. Existing CEP systems assume that the occurrence time of each event is known precisely, however we observe that event occurrence times are often unknown or imprecise.  Therefore, we propose a temporal model that assigns a time interval to each event to represent all of its possible occurrence times, two evaluation frameworks, and optimizations in these frameworks. Our new approach achieves high efficiency for a wide range of workloads tested using both both real traces and synthetic datasets. This contribution enables CEP techniques applicable for more application scenarios.

Another contribution for the passive monitoring is that we improve the evaluation performance significantly for expensive queries in CEP. Those expensive queries involve Kleene closure patterns, flexible event selection strategies, and events with imprecise timestamps. It is infeasible to evaluate  these useful yet complex queries using existing queries due to performance bottleneck. We develop a series of optimizations after analyzing the complexity of these pattern queries. Microbenchmark results show superior performance of our system for expensive pattern queries while most state-of-the-art systems suffer from poor performance. A thorough case study on Hadoop cluster monitoring further demonstrates the efficiency and effectiveness of our proposed techniques.

The last problem we are solving in my thesis is about proactive monitoring. We start from explaining anomalies in results of CEP queries. When users find anomalies from the passive monitoring results, existing CEP systems are unable to provide any explanation. Our new system compares context events for user annotated anomalies with context for normal status, and find the most differentiating features as the potential explanations for those anomalies. A series of optimizations are also proposed to improve the effectiveness and efficiency. Finally, the generated explanations can be translated into CEP queries for proactively recognizing future anomalies.

Advisor: Yanlei Diao