Computational Methods for Identification of Molecular Signatures

Weijun YiFollow

Semester

Spring

Date of Graduation

2026

Document Type

Dissertation

Computational Methods for Identification of Molecular Signatures HTML.html (504 kB)

Degree Type

PhD

College

Statler College of Engineering and Mineral Resources

Department

Lane Department of Computer Science and Electrical Engineering

Committee Chair

Donald Adjeroh

Committee Co-Chair

Gangqing Hu

Committee Member

Gianfranco Doretto

Committee Member

Jeremy Dawson

Committee Member

Elaine Eschen

Committee Member

Sijin Wen

Abstract

This work develops computational methods for identifying molecular signatures from high-throughput genomic data and for modeling long non-coding RNA (lncRNA) sub-cellular localization. The response of multiple myeloma to CB-6644, a selective RUVBL1/2 complex inhibitor with potential anti-tumor activity, is analyzed to identify drug-responsive pathways and molecular signatures. Conventional gene set enrichment analysis (GSEA) often excludes low-expression genes. Here, phenotype comparison is reformulated as a supervised machine learning problem: genes most informative for discrimination are first selected using a machine learning approach, and GSEA is then applied to these machine-learning derived gene sets. This framework improves detection of CB-6644-associated pathways. For lncRNAs, first, lncRNA localization signatures are studied directly using the RNA sequence. Inexact q-mer representations are analyzed for sub-cellular localization prediction, cell-type specificity, and localization-switching lncRNAs. The analyses identify localization-associated sequence segments (signatures), show that part of the signal is cell-type dependent, and indicate that 5’ transcript regions contain stronger localization signals than 3’ regions. Then, a graph neural network framework for lncRNA localization is developed in which each transcript is represented as a graph whose nodes are sequence windows and whose edges encode both local adjacency and non-local sequence similarity, while an optional global branch captures transcript-level context. Across benchmark task, the GNN achieves competitive performance and shows that graph structure provides useful contextual refinement beyond strong sequence features. Perturbation-based interpretation further highlights biologically plausible sequence windows, including nuclear-retention-like C-rich motifs. Together, these studies show that machine learning can recover molecular and localization signatures across multiple levels of biological organization, from gene sets to transcript motifs to graph-structured representations. The dissertation contributes new methods for drug-response signature discovery, lncRNA sub-cellular localization prediction, and interpretable sequence modeling, advancing computational approaches for functional genomics.

Recommended Citation

Yi, Weijun, "Computational Methods for Identification of Molecular Signatures" (2026). Graduate Theses, Dissertations, and Problem Reports. 13300.
https://researchrepository.wvu.edu/etd/13300

Download

Included in

Other Biomedical Engineering and Bioengineering Commons, Other Computer Engineering Commons

COinS

DOI

https://doi.org/10.33915/etd.13300

Computational Methods for Identification of Molecular Signatures

Semester

Date of Graduation

Document Type

Degree Type

College

Department

Committee Chair

Committee Co-Chair

Committee Member

Committee Member

Committee Member

Committee Member

Abstract

Recommended Citation

Included in

DOI

Browse

Resources

Search

Author Corner

Computational Methods for Identification of Molecular Signatures

Author

Semester

Date of Graduation

Document Type

Degree Type

College

Department

Committee Chair

Committee Co-Chair

Committee Member

Committee Member

Committee Member

Committee Member

Abstract

Recommended Citation

Included in

Share

DOI

Browse

Resources

Search

Author Corner