Skip to main content
UMass Collegiate M The University of Massachusetts Amherst
  • Visit
  • Apply
  • Give
  • Search UMass.edu
Manning College of Information & Computer Sciences

Main navigation

  • Academics

    Programs

    Undergraduate Programs Master's Programs Doctoral Program Graduate Certificate Programs

    Academic Support

    Advising Career Development Academic Policies Courses Scholarships and Fellowships
  • Research

    Research

    Research Areas Research Centers & Labs Undergraduate Research Opportunities

    Faculty & Researchers

    Faculty Directory Faculty Achievements Turing Award

    Engage

    Research News Distinguished Lecturer Series Rising Stars in Computer Science Lecture Series
  • Community

    On-Campus

    Community, Outreach, and Organizational Learning Student Organizations Massenberg Summer STEM Program Awards Programs Senior Celebration

    External

    Alumni Support CICS
  • People
    Full A-Z Directory Faculty Staff
  • About

    Overview

    College Overview Leadership Our New Building

    News & Events

    News & Stories Events Calendar

    Connect

    Visiting CICS Contact Us Employment Offices & Services
  • Info For
    Current Undergraduate Students Current Graduate Students Faculty and Staff Newly Accepted Undergraduate Students

PhD Thesis Defense: Avijit Mitra, Advancing the Knowledge of Social and Behavioral Determinants of Health Using AI in Health Outcome Research

Content

Monday, August 25, 2025, 2:00 PM - Monday, August 25, 2025, 4:00 PM

Online
PhD Thesis Defense
Presentation

Speaker

Avijit Mitra

Abstract

Social and behavioral determinants of health (SBDH) are critical to  
understanding health outcomes and designing effective clinical decision  
support systems (CDSS). Despite their importance, SBDH information is often embedded in unstructured clinical data, limiting its utility in research and practice. With the widespread adoption of electronic health record (EHR) systems in the United States, natural language processing (NLP) has emerged as a promising approach to harness this relatively untapped resource. This dissertation explores the potential of NLP to extract, analyze, and apply SBDH information to advance clinical insights. We address three key research areas: (1) advancing SBDH data extraction, (2) elucidating the role of SBDH in critical health outcomes, and (3) integrating SBDH information into CDSS.

To address the need for high-quality public datasets to support the development of effective SBDH extraction systems, we propose Synth-SBDH, a novel synthetic dataset designed to improve NLP-based extraction of SBDH from clinical text. Generated using a large language model (LLM) via in-context learning, Synth-SBDH offers detailed annotations across 15 SBDH categories, including status, temporal information, and rationale. Our work demonstrates the dataset's utility in real-world applications, showcasing improvements in extracting rare SBDH categories with impressive performance gains in macro-F scores. Furthermore, we highlight Synth-SBDH’s potential to enhance model generalizability and support the training of models under constrained scenarios, paving the way for broader applications in clinical contexts. We  
propose to make our framework more generalizable and performant across different domains by incorporating LLMs and leveraging a larger and more diverse synthetic clinical SBDH dataset.

Next, we developed robust multi-task learning frameworks to extract SBDH from unseen EHR text and leveraged them to explore the association between SBDH and critical health outcomes. We started by investigating the role of NLP-extracted SBDH in understanding risk factors for opioid overdose (OOD) and suicide. Using clinical data from intensive care unit admissions, we identified significant associations between several SBDH factors — such as illicit drug use and insurance status - and nonfatal OOD. Our findings also underscore the substantial disparity between ICD-coded SBDH information and the wealth of data extractable through NLP, highlighting the transformative potential of NLP in filling gaps in observational research and clinical care.  

Similarly, using veteran health data, we found significant associations between NLP-enriched social determinants of health (SDOH) and critical health outcomes such as suicide and fatal OOD. These studies validate the utility of NLP in uncovering nuanced SBDH details that are often overlooked, thereby enhancing our understanding of the role of SBDHs in critical health outcomes.

Finally, we demonstrated the integration of NLP-extracted SBDH into predictive modeling frameworks to support CDSSs. Our research shows that incorporating SBDH into suicide prediction models significantly improves performance metrics across multiple prediction time windows. Similarly, in the context of fatal OOD prediction, both traditional machine learning and deep learning models—including those based on LLMs—benefited from the inclusion of SDOH predictors. These results highlight the utility of NLP-enriched SBDH to inform healthcare practitioners and policymakers, driving data-informed interventions.

Advisor

Hong Yu

Online event posted in PhD Thesis Defense

More link

Join via Zoom

Site footer

Manning College of Information & Computer Sciences
  • Find us on Facebook
  • Find us on YouTube
  • Find us on LinkedIn
  • Find us on Instagram
  • Find us on Flickr
  • Find us on Bluesky Social
Address

140 Governors Dr
Amherst, MA 01003
United States

  • Visit CICS
  • Give
  • Contact Us
  • Employment
  • Events Calendar
  • Offices & Services

Info For

  • Current Undergraduate Students
  • Current Graduate Students
  • Faculty & Staff
  • Newly Accepted Undergraduate Students

Global footer

  • ©2025 University of Massachusetts Amherst
  • Site policies
  • Privacy
  • Non-discrimination notice
  • Accessibility
  • Terms of use