PhD Thesis Defense: Avijit Mitra, Advancing the Knowledge of Social and Behavioral Determinants of Health Using AI in Health Outcome Research
Content
Speaker
Abstract
Social and behavioral determinants of health (SBDH) are critical to
understanding health outcomes and designing effective clinical decision
support systems (CDSS). Despite their importance, SBDH information is often embedded in unstructured clinical data, limiting its utility in research and practice. With the widespread adoption of electronic health record (EHR) systems in the United States, natural language processing (NLP) has emerged as a promising approach to harness this relatively untapped resource. This dissertation explores the potential of NLP to extract, analyze, and apply SBDH information to advance clinical insights. We address three key research areas: (1) advancing SBDH data extraction, (2) elucidating the role of SBDH in critical health outcomes, and (3) integrating SBDH information into CDSS.
To address the need for high-quality public datasets to support the development of effective SBDH extraction systems, we propose Synth-SBDH, a novel synthetic dataset designed to improve NLP-based extraction of SBDH from clinical text. Generated using a large language model (LLM) via in-context learning, Synth-SBDH offers detailed annotations across 15 SBDH categories, including status, temporal information, and rationale. Our work demonstrates the dataset's utility in real-world applications, showcasing improvements in extracting rare SBDH categories with impressive performance gains in macro-F scores. Furthermore, we highlight Synth-SBDH’s potential to enhance model generalizability and support the training of models under constrained scenarios, paving the way for broader applications in clinical contexts. We
propose to make our framework more generalizable and performant across different domains by incorporating LLMs and leveraging a larger and more diverse synthetic clinical SBDH dataset.
Next, we developed robust multi-task learning frameworks to extract SBDH from unseen EHR text and leveraged them to explore the association between SBDH and critical health outcomes. We started by investigating the role of NLP-extracted SBDH in understanding risk factors for opioid overdose (OOD) and suicide. Using clinical data from intensive care unit admissions, we identified significant associations between several SBDH factors — such as illicit drug use and insurance status - and nonfatal OOD. Our findings also underscore the substantial disparity between ICD-coded SBDH information and the wealth of data extractable through NLP, highlighting the transformative potential of NLP in filling gaps in observational research and clinical care.
Similarly, using veteran health data, we found significant associations between NLP-enriched social determinants of health (SDOH) and critical health outcomes such as suicide and fatal OOD. These studies validate the utility of NLP in uncovering nuanced SBDH details that are often overlooked, thereby enhancing our understanding of the role of SBDHs in critical health outcomes.
Finally, we demonstrated the integration of NLP-extracted SBDH into predictive modeling frameworks to support CDSSs. Our research shows that incorporating SBDH into suicide prediction models significantly improves performance metrics across multiple prediction time windows. Similarly, in the context of fatal OOD prediction, both traditional machine learning and deep learning models—including those based on LLMs—benefited from the inclusion of SDOH predictors. These results highlight the utility of NLP-enriched SBDH to inform healthcare practitioners and policymakers, driving data-informed interventions.
Advisor
Hong Yu