Retrieval Models based on Linguistic Features of Verbose Queries

21 Aug
Thursday, 08/21/2014 9:00am to 11:00am
Ph.D. Seminar

Jae Hyun Park

Computer Science Building, Room 151

Natural language expressions are more familiar to users than choosing keywords for queries.  Given that, people can use natural language expressions to represent their sophisticated information needs. Instead of listing keywords, verbose queries are expressed in a grammatically well-formed phrase or sentence in which terms are used together to represent the more specific meanings of a concept, and the relationships of these concepts are expressed by function words.

The goal of this thesis is to investigate methods of using the semantic and syntactic features of natural language queries to maximize the effectiveness of search. For this purpose, we propose the synchronous framework in which we use syntactic parsing techniques for modeling term dependencies. We use the Generative Relevance Hypothesis (GRH) to evaluate valid variations in dependence relationships between queries and documents. This is one of the first results demonstrating that dependency parsing can be used to improve retrieval effectiveness.

We propose a method for classifying concepts in verbose queries as key concepts and secondary concepts that are used in the statistical translation model for query term expansion.  Key concepts are the most important terms of queries. We use key concepts as the context for translating terms. Although secondary (key) concepts are not as important as key concepts, they are still important because they provide clues about what kinds of information users are looking for. Based on concept classification results, we elaborate a translation model in which terms are selectively translated according to the most important context of a given query or question.

We define the important new task of focused retrieval of answer passages that aims to immediately provide answers for users' information needs while the length of answer passage should be suitable for restricted search environments such as mobile devices and voice-based search systems.

Advisor: W. Bruce Croft