Author ORCID Identifier

https://orcid.org/0>0009-0003-3996-2776

Semester

Spring

Date of Graduation

2025

Document Type

Dissertation

Degree Type

PhD

College

Eberly College of Arts and Sciences

Department

Chemistry

Committee Chair

Mark Tinsley

Committee Co-Chair

Blake Mertz

Committee Member

Stephen Valentine

Committee Member

Hacer Karatas Bristow

Committee Member

Srinjoy Das

Abstract

Studies in protein ligand binding are a significant focus in the field of medicinal chemistry

and biochemistry, facilitating a greater understanding of the structure-function relationship

of the protein and controlling function via drug development. In this research, we identify

new binding sites for lipids on the microbial membrane protein, proteorhodopsin, and apply

machine learning approaches to accelerate the generation of binding affinity-like data for the

use in the identification of potential small molecule drugs. The platelet activating factor

receptor. Cardiolipin, a lipid in membranes, and proteorhodopsin, a microbial proton pump,

are both commonly found in microbial membranes, but little is known about their putative

interactions. Using the Martini force field with coarse grain, molecular dynamics (MD) simulations

were carried out on μs time scales to model the lateral interactions of cardiolipin and

proteorhodopsin in a bilayer environment. We managed to identify two potential cardiolipin

binding sites with long-lived residence times. Both of these binding sites are located near regions

critical to proteorhodopsin function, suggesting that cardiolipin may play a role in the

ability for proteorhodopsin to pump protons across the outer membrane. The second area of

research was the application of machine learning approaches as a substitute for the screening

of small-molecule hits on the platelet activating factor receptor. Graph neural networks

with attentive mechanisms were trained in a 50,000-ligand library using structural features

and docking scores as input. Several ML techniques were applied to the generation of the

GNN models, identifying the combination of features that maximized accurate prediction

docking scores while also prioritizing efficiency. Although there were issues with overfitting

using a relatively small dataset, this GNN model in combination with active learning has the

potential to accelerate screening of ligand libraries. Further developments of this approach

are likely to improve the accuracy of screening, as well.

Share

COinS