Title: High-Performance Complex Event Processing for Decision Analytics

23 Sep
Wednesday, 09/23/2015 2:00pm to 4:00pm
Computer Science Building, Room 151
Ph.D. Seminar
Speaker: Haopeng Zhang

Complex Event Processing (CEP) systems are becoming increasingly popular in domains for decision analytics such as financial services, transportation, cluster monitoring, supply chain management, business process management, and health care. These systems collect or create high volumes of events, which form an event stream and the stream often needs to be processed in real-time. CEP queries are applied for filtering, correlation, aggregation, and transformation, to derive high-level, actionable information. In this thesis, we make contributions to make CEP technologies more applicable, more efficient and more explainable for decision analytics.

The first contribution we make is to apply CEP queries over streams with imprecise timestamps, which is infeasible before this work. Existing CEP systems assume that the occurrence time of each event is known precisely, however we observe that event occurrence times are often unknown or imprecise.  Therefore, we propose a temporal model that assigns a time interval to each event to represent all of its possible occurrence times, two evaluation frameworks, and optimizations in these frameworks. Our new approach achieves high efficiency for a wide range of workloads tested using both both real traces and synthetic datasets. This contribution enables CEP techniques applicable for more application scenarios.

Another contribution is that we improve the evaluation performance significantly for expensive queries in CEP. Those expensive queries involve Kleene closure patterns, flexible event selection strategies, and events with imprecise timestamps. We develop a series of optimizations after analyzing the complexity of these pattern queries. Microbenchmark results show superior performance of our system for expensive pattern queries while most state-of-the-art systems suffer from poor performance. A thorough case study on Hadoop cluster monitoring further demonstrates the efficiency and effectiveness of our proposed techniques.

The last problem solved in this thesis is to explain anomalies in CEP-based monitoring. CEP queries are used widely for monitoring purpose. When users observe abnormal status in the monitoring results, they want to explain the anomalies soon and make decisions accordingly. However, due to the high complexity of monitored systems, it is overwhelmingly complicated and extremely time-consuming for users to figure out explanations by manually looking up huge volume of logs. While existing systems are unable to assist on this problem, we develop an enhanced system which can generate explanations for user-annotated anomalies. Multiple use cases on real dataset demonstrate the generated explanations are revealing the ground truth effectively and the system is efficient. 

Advisor: Yanlei Diao