Content

Speaker:

Nelson Evbarunegbe

Abstract:

The ability to accurately predict molecular properties aids modern drug discovery, offering a computational route to identify compounds with desired biological and physicochemical behaviors before costly laboratory experiments. Among the most critical yet underexplored properties is the ability of small molecules to permeate the complex mycomembrane of Mycobacterium tuberculosis (Mtb) — the causative agent of tuberculosis (TB). TB remains one of the world’s leading infectious diseases, and the scarcity of new anti-tubercular agents is largely attributed to the slow growth of the pathogen and the difficulty in identifying molecules that can successfully penetrate its unique cell envelope.

This dissertation explores machine learning (ML) techniques for molecular property prediction, with a focus on modeling and understanding mycomembrane permeability. First, we propose MycoPermeNet, a graph-based deep learning model designed to learn the intrinsic relationship between molecular structure and permeability across the Mtb membrane. MycoPermeNet not only achieves robust predictive performance but also provides interpretable chemical insights, identifying key scaffolds and molecular fragments that promote permeability.

Recognizing potential limitations in generalizing to out-of-distribution and chemically novel compounds, we propose to extend our approach with MycoPermeNet v2, which integrates multi-level feature representations combining global and local molecular information. This enhanced architecture significantly improves prediction accuracy and generalization. We validate the generalizability of our approach by demonstrating superior performance across multiple benchmark datasets from the MoleculeNet library.

To bridge computational modeling with experimental validation, we propose to build an active learning framework that enables our models to operate effectively “in the loop” with the laboratory. We propose to conduct a comprehensive benchmarking of active learning strategies for molecular property prediction and test whether enriching the feature representation space via feature fusion leads to more efficient and targeted molecular selection. This strategy aims to yield high-confidence compound recommendations for laboratory testing, effectively guiding permeability assays toward the most promising candidates.

Finally, to contextualize permeability mechanisms within the Mycobacterium genus, we propose a comparative cheminformatic analysis across four related species. This study uncovers chemical scaffolds ,molecular regions, and physicochemical properties that influence permeability differentially across species, providing new insights into cross-species drug design that provides more understanding of Mtb, the species of interest.

Collectively, this dissertation establishes a comprehensive computational framework using both chemical and biological domain knowledge for modeling, interpreting, and leveraging molecular properties, with understanding mycobacterial permeability as a case study. The methodologies developed herein — encompassing graph-based modeling, feature fusion, and active learning — advance both the predictive power and interpretability of ML-driven drug discovery, contributing a valuable foundation for rational antibiotic compound design.

Advisor:

Anna Green