Content

Speaker

Prasanna Lakkur

Abstract

This research advances methodologies for analyzing textual content from large-scale online datasets, interpreting message intent, retrieving knowledge, and leveraging these insights to address practical challenges across diverse domains. Firstly, two complementary approaches are used to improve Temporal Question Answering. TempoQR uses a RAG inspired approach to enhance the question representation using relevant facts from a large knowledge graph. LASR, on the other hand, focuses on understanding the intent of the question to pick the best SPARQL query to answer temporal questions. Building upon this work, intent classification is then used to classify messages with labels of interest to the law enforcement to help them triage massive corpora of conversations. The classification model enables a novel conversation clustering technique, revealing distinct conversational patterns and providing insights into different victim experiences.

The focus is then turned towards understanding the opinions in large-scale online communities. A self-supervised approach models collective community opinion by leveraging readily available data. This enables two novel techniques for community comparison: BOTS, which calculates similarity based on expressed opinions; and Emb-PSR, which calculates similarity based on content. Results demonstrate that both BOTS and Emb-PSR outperform existing methods in their respective tasks and facilitate cross-platform comparisons between communities. These methods are further enhanced through the integration of RAG to improve opinion modeling accuracy, particularly in smaller communities.

Finally, I used this enhanced model to examine dynamics in online communities. I introduce a new dataset to validate our approach. This work provides tools for understanding user intent and online dynamics.