PhD Dissertation Proposal Defense: Xi Chen, Leveraging Large Language Models to Mine and Explain Global Media Coverage and Agenda Setting
Content
Speaker
Abstract
Global media plays a pivotal role in shaping public perception through selective coverage and framing of events, a process central to agenda-setting theory. However, understanding how media influences public discourse across countries and languages remains a significant challenge, particularly at scale. With advancements in large language models (LLMs), there is an opportunity to leverage media text to systematically identify events and analyze agenda-setting processes on a global scale with minimal input data. This dissertation aims to utilize LLMs to mine, explain, and characterize global media coverage.
First, using a human-in-the-loop framework that improves the quality of sampled news article pairs for annotation, I develop a comprehensive multilingual news similarity dataset. The dataset is annotated with aspects corresponding to the two levels of agenda-setting theory, evaluating news content, as well as its framing and tone. To the best of our knowledge, this is the largest to date labeled dataset of multilingual news article similarity, containing 26,555 labeled news article pairs across 10 languages.
Second, I propose a framework to identify global news events by: (i) developing a transformer-based model for multilingual news similarity fine-tuned on the developed news similarity dataset; (ii) constructing a global event identification system that clusters news articles based on their similarity network; and (iii) measuring news synchrony across countries and diversity within countries, based on their coverage of global events. This framework is designed to scale efficiently to millions of news articles, enabling both the identification of country communities displaying common patterns in news coverage and the quantification of socio-political and economic factors influencing news production.
Third, I leverage an LLM-based event coverage identification method that operates without the need for training data to identify news about specific natural disasters and terrorist attack events. A case study using this method reveals news coverage patterns consistent with prior literature: media coverage of disasters and terrorist attacks correlates with death tolls, the GDP of the affected country, and trade volume between the reporting country and the country where the event occurred.
Finally, I aim to extend this analysis beyond traditional media to social media. I propose an effective framework for identifying global events, such as elections, based on data from social media. Then, I will analyze public discourse about such events. This work will provide a more comprehensive view of how public discourse evolves across different forms of media.
Advisor
Przemyslaw Grabowicz