Faculty Recruiting Support CICS

Reasoning About User Feedback Under Identity Uncertainty in Knowledge Base Construction

05 Aug
Wednesday, 08/05/2020 5:00pm to 7:00pm
Zoom Meeting
PhD Thesis Defense
Speaker: Ari Kobren

Zoom Meeting: Zoom linkhttps://umass-amherst.zoom.us/j/94922195175


Intelligent, automated systems that are intertwined with everyday life---such as Google Search and virtual assistants like Amazon's Alexa or Apple's Siri---are often powered in part by knowledge bases (KBs), i.e., structured data repositories of entities, their attributes, and the relationships among them.  Despite a wealth of research focused on automated KB construction methods, KBs are inevitably imperfect, with errors stemming from various points in the construction pipeline. Making matters more challenging, new data is created daily and must be integrated with existing KBs so that they remain up-to-date. As the primary consumers of KBs, human users have tremendous potential to aid in KB construction by contributing feedback that identifies spurious and missing entity attributes and relations.  However, correctly integrating user feedback with an existing KB is complicated by the necessity to resolve identity uncertainty, i.e., uncertainty regarding to which real-world entity a piece of data refers. Identity uncertainty abounds in the collection of raw evidence from which a KB is built. Moreover, it gives rise to identity uncertainty in user feedback, when KB entities, which were affected by user feedback, are split or merged.

In this dissertation, we present a continuous reasoning framework capable of integrating user feedback with a KB, under identity certainty. To begin, we introduce Grinch, an online entity resolution (ER) algorithm---with provable correctness guarantees---capable of merging and splitting KB entities as new data arrives. We show that Grinch is efficient and achieves state-of-the-art performance in ER as well as in clustering. Next, we propose a method for using Grinch to resolve identity uncertainty in a KB's underlying data as well as in user feedback. Our approach is based on representing user feedback as mentions, i.e., first class KB objects that participate in all parts of KB construction.  Furthermore, we introduce a structured representation for feedback comprised of packaging and payload, which facilitates recovery from KB errors that stem from both identity uncertainty and noisy data. Finally, we evaluate our framework's efficacy using data from the KB that supports OpenReview.net---a deployed, conference management system that solicits feedback from users. The demands of OpenReview.net lead us to develop XGrinch (XGS), a variant of Grinch that builds trees with arbitrary branching factors, and subsequently instantiates 60% fewer internal nodes than Grinch. Empirically, we show that XGS is efficient, and is able to effectively utilize user feedback to improve the correctness and completeness of the OpenReview.net KB.

Advisor: Andrew McCallum