Author ORCID Identifier

https://orcid.org/0000-0002-1054-8796

Semester

Fall

Date of Graduation

2022

Document Type

Dissertation

Degree Type

PhD

College

Statler College of Engineering and Mineral Resources

Department

Lane Department of Computer Science and Electrical Engineering

Committee Chair

Nancy Lan Guo

Committee Co-Chair

Donald Adjeroh

Committee Member

Donald Adjeroh

Committee Member

Katerina Goseva-Popstojanova

Committee Member

Michael Hu

Committee Member

Xin Li

Abstract

Lung cancer has the second highest cancer incidence rate and the top cancer-related mortality worldwide. An estimate from the American Cancer Society shows that, in 2022, there will be about 236,740 lung cancer cases (117,910 men and 118,830 women) in the US. To date, there are no prognostic/predictive biomarkers to select chemotherapy, immunotherapy, and radiotherapy in individual non-small cell lung cancer (NSCLC) patients. There is an unmet clinical need to identify patients with early-stage NSCLC who are likely to develop recurrence and to predict their therapeutic responses. This dissertation developed a novel computational methodology for modeling molecular gene association networks based on DNA copy number variations, gene expression, protein expression, and single-cell gene expression data of NSCLC and discovering novel biomarkers and therapeutic targets. This dissertation has made the following technical and theoretical contributions to the scientific field: First, a practical extension was made on the Boolean implication network algorithm based on prediction logic. The Boolean implication networks are probabilistic graphical models that express the relationship between two variables. It has conceptual advantages over the existing methodologies. This dissertation extended the usage of the Boolean implication network to model multinary instead of binary data, and construct multi-omics and single-cell omics gene regulatory networks (GRN). Several harmonization techniques were adopted to obtain compatible data and make it possible to build cross-level multi-omics networks in multiple cohorts from different platforms. Secondly, an innovative data driven pipeline was developed for biomarker discovery and therapeutic target identification. The further exploitation of the information contained in the constructed Boolean implication networks is carried out. Novel prognostic genes and proliferation genes were found, and functional pathways, targeted therapies, and repositioning drugs were discovered based on the genes we identified. The developed framework can be applied to any disease with sufficient data. Thirdly, a landscape evaluation was conducted of the biological and clinical relevance of multi-omics and single-cell Boolean implication network centralities rigorously quantified with graph theory centrality metrics in NSCLC tumors. This is the first systematical revelation of the association between multi-omics network centralities and NSCLC tumorigenesis, proliferation, and patient survival. It is proved that gene centrality metrics in GRN can be used in the prioritization of candidates for biomarkers and drug targets. In the future, the results obtained from this dissertation can be tested for biological verification or confirmation of experimental results, thereby helping to identify genes that play an essential role in the cause and progression of NSCLC and to find potential drugs which can be used in the treatment of NSCLC.

Embargo Reason

Publication Pending

Share

COinS