Semester

Spring

Date of Graduation

2026

Document Type

Dissertation

Degree Type

PhD

College

Statler College of Engineering and Mineral Resources

Department

Lane Department of Computer Science and Electrical Engineering

Committee Chair

Donald Adjeroh

Committee Co-Chair

Gangqing Hu

Committee Member

Gianfranco Doretto

Committee Member

Jeremy Dawson

Committee Member

Elaine Eschen

Committee Member

Sijin Wen

Abstract

This work develops computational methods for identifying molecular signatures from high-throughput genomic data and for modeling long non-coding RNA (lncRNA) sub-cellular localization. The response of multiple myeloma to CB-6644, a selective RUVBL1/2 complex inhibitor with potential anti-tumor activity, is analyzed to identify drug-responsive pathways and molecular signatures. Conventional gene set enrichment analysis (GSEA) often excludes low-expression genes. Here, phenotype comparison is reformulated as a supervised machine learning problem: genes most informative for discrimination are first selected using a machine learning approach, and GSEA is then applied to these machine-learning derived gene sets. This framework improves detection of CB-6644-associated pathways. For lncRNAs, first, lncRNA localization signatures are studied directly using the RNA sequence. Inexact q-mer representations are analyzed for sub-cellular localization prediction, cell-type specificity, and localization-switching lncRNAs. The analyses identify localization-associated sequence segments (signatures), show that part of the signal is cell-type dependent, and indicate that 5’ transcript regions contain stronger localization signals than 3’ regions. Then, a graph neural network framework for lncRNA localization is developed in which each transcript is represented as a graph whose nodes are sequence windows and whose edges encode both local adjacency and non-local sequence similarity, while an optional global branch captures transcript-level context. Across benchmark task, the GNN achieves competitive performance and shows that graph structure provides useful contextual refinement beyond strong sequence features. Perturbation-based interpretation further highlights biologically plausible sequence windows, including nuclear-retention-like C-rich motifs. Together, these studies show that machine learning can recover molecular and localization signatures across multiple levels of biological organization, from gene sets to transcript motifs to graph-structured representations. The dissertation contributes new methods for drug-response signature discovery, lncRNA sub-cellular localization prediction, and interpretable sequence modeling, advancing computational approaches for functional genomics.

Share

COinS