Contextual and Spatio-temporal Data Cleaning

06 Dec
Thursday, 12/06/2018 4:00pm to 5:00pm
Computer Science Building, Room 150/151
Speaker: Fei Chiang

Abstract: It is becoming increasingly difficult for organizations to reap value from their data due to poor data quality.  This is motivated by the observation that real data is rarely error free, containing incomplete, inconsistent, and stale values.   This leads to inaccurate, and out-of-date data analysis downstream.   Addressing data inconsistency requires not only reconciling differing syntactic references to an entity, but it is often necessary to include domain expertise to correctly interpret the data.   For example, understanding that a reference to 'jaguar' may be interpreted as an animal or as a vehicle.  Secondly, having up-to-date (or current) data is important for timely data analysis.  Cleaning stale values goes beyond just relying on timestamps, especially when timestamps may be missing, inaccurate or incomplete. 

In this talk, I will present our work towards achieving consistent and up-to-date data.  First, I will discuss contextual data cleaning that uses a new class of data integrity constraints that tightly integrate domain semantics from an ontology.   Second, we argue that data currency is a relative notion based on individual spatio-temporal update patterns, and these patterns can be learned and predicted.   I will present our framework to achieve these two objectives, and provide a brief overview of recent extensions with applications to knowledge fusion.

Bio: Fei Chiang is an Assistant Professor in the Department of Computing and Software at McMaster University.  She is a Faculty Fellow at the IBM Centre for Advanced Studies, and served as an inaugural Associate Director of the McMaster MacData Institute.  She received her M. Math from the University of Waterloo, and B.Sc and PhD degrees from the University of Toronto, all in Computer Science.  Her research interests are in data quality, data cleaning, data privacy and text mining. She holds four patents for her work in self-managing database systems.  Her work has been featured in the Southern Ontario Smart Computing Impact Report.  She is a recipient of the Dean's Teaching Honour Roll, and a 2018 Ontario Early Researcher Award.

