Teaching AI to Answer Subjective Questions

Helia Hashemi

October 03, 2019

Voice assistants like Amazon’s Alexa or Google Home reliably answer questions within seconds for millions of users daily—especially when they have simple fact-based answers, such as “What is the diameter of the Earth?” But they don’t do as well with questions requiring responses that aren’t simple facts. These commonly-posed “non-factoid” questions may require an answer that is both useful and subjective, such as “How do you prevent chicken from drying out when you cook it?”

To help develop better answers to questions like these, researchers at the Center for Intelligent Information Retrieval (CIIR) at the College of Information and Computer Sciences, UMass Amherst (CICS) have created ANTIQUE, a dataset of over 34,000 question-answer pairs based on a collection of over 2,600 non-factoid questions asked by real users in community question answering services like Yahoo! Answers.

“As more people ask questions of their phones or devices, we see a growing need to develop methods for these devices to respond well to questions that don’t have simple fact-based answers,” says Helia Hashemi, CICS doctoral student and lead researcher on the project.

To create ANTIQUE, the research team obtained a set of questions from a diverse set of categories. They collected relevance judgments for all the answers to each question through a careful, multi-stage crowdsourcing process. The resulting dataset promises to benefit researchers working in neural information retrieval, non-factoid question answering, and passage retrieval, Hashemi explains.

Google AI researchers included the dataset during their tutorial for TF-Ranking, a new open source TensorFlow package for learning-to-rank, which they first demonstrated at the ACM SIGIR Conference on Research and Development in Information Retrieval in July (SIGIR 2019).

“We chose ANTIQUE over other existing learning-to-rank datasets for the TF-Ranking tutorial, as it provides textual content, and not just the derived numerical features. This is crucial for demonstrating the power of neural nets for tasks like question answering and passage retrieval,” explains Michael Bendersky, a senior staff software engineer at Google and TF-Ranking team member who received his doctorate from CICS in 2012 after completing his studies at CIIR. “Our Github users enthusiastically adopted the ANTIQUE-based tutorial, and we recommend it as a canonical example for anyone who wants to try out TF-Ranking.”

To facilitate further research, the CIIR ANTIQUE team of Hashemi, CIIR Director Bruce Croft, CIIR doctoral student Hamed Zamani, and then-CIIR visiting research scholar Mohammad Aliannejadi of the University of Lugano provides an analysis of the data, as well as baseline results on neural information retrieval models, in their paper, “ANTIQUE: A Non-Factoid Question Answering Benchmark.”

A hands-on demonstration of TF-Ranking is available at GitHub. Both the Google-created TF-Ranking processed files and the CIIR ANTIQUE dataset are available on the CIIR website.

This article used to be titled, "CIIR Researchers Release ANTIQUE, a Dataset to Help AI Answer Questions with Subjective Answers"