Faculty Recruiting Support CICS

Robust Algorithms for Clustering with Applications to Data Integration

14 May
Friday, 05/14/2021 2:00pm to 4:00pm
Zoom Meeting
PhD Thesis Defense

Zoom Meeting: https://umass-amherst.zoom.us/j/97353450461?pwd=a2N3L04rQ1pjcHlXUzkrMHRKd0xqQT09
Meeting ID: 973 5345 0461
Passcode: sgdefense

Abstract: A growing number of data-based applications are used for decision-making that have far-reaching consequences and significant societal impact. Entity resolution, community detection and taxonomy construction are some of the building blocks of these applications and for these methods, clustering is the fundamental underlying concept. Therefore, the use of accurate, robust and scalable methods for clustering cannot be overstated. We tackle the various facets of clustering with a multi-pronged approach described below. 

a) While identification of clusters that refer to different entities is challenging for automated strategies, it is relatively easy for humans. We study the robustness of clustering methods that leverage supervision through an oracle i.e an abstraction of crowdsourcing.  Additionally, we focus on scalability to handle web-scale datasets.

b) In community detection applications, a common setback in evaluation of the quality of clustering techniques is the lack of ground truth data. We propose a generative model to capture interactions between records that belong to different clusters and devise techniques for efficient cluster recovery. 

Advisor: Barna Saha