Date of Graduation


Document Type


Degree Type



Statler College of Engineering and Mineral Resources


Lane Department of Computer Science and Electrical Engineering

Committee Chair

Lan Guo


Lung cancer is the most fatal cancer around the world. Current lung cancer prognosis and treatment is based on tumor stage population statistics and could not reliably assess the risk for developing recurrence in individual patients. Biomarkers enable treatment options to be tailored to individual patients based on their tumor molecular characteristics. To date, there is no clinically applied molecular prognostic model for lung cancer. Statistics and feature selection methods identify gene candidates by ranking the association between gene expression and disease outcome, but do not account for the interactions among genes. Computational network methods could model interactions, but have not been used for gene selection due to computational inefficiency. Moreover, the curse of dimensionality in human genome data imposes more computational challenges to these methods.;We proposed two hybrid systems for the identification of prognostic gene signatures for lung cancer using gene expressions measured with DNA microarray. The first hybrid system combined t-tests, Statistical Analysis of Microarray (SAM), and Relief feature selections in multiple gene filtering layers. This combinatorial system identified a 12-gene signature with better prognostic performance than published signatures in treatment selection for stage I and II patients (log-rank P<0.04, Kaplan-Meier analyses). The 12-gene signature is a more significant prognostic factor (hazard ratio=4.19, 95% CI: [2.08, 8.46], P<0.00006) than other clinical covariates. The signature genes were found to be involved in tumorigenesis in functional pathway analyses.;The second proposed system employed a novel computational network model, i.e., implication networks based on prediction logic. This network-based system utilizes gene coexpression networks and concurrent coregulation with signaling pathways for biomarker identification. The first application of the system modeled disease-mediated genome-wide coexpression networks. The entire genomic space were extensively explored and 21 gene signatures were discovered with better prognostic performance than all published signatures in stage I patients not receiving chemotherapy (hazard ratio>1, CPE>0.5, P < 0.05). These signatures could potentially be used for selecting patients for adjuvant chemotherapy. The second application of the system modeled the smoking-mediated coexpression networks and identified a smoking-associated 7-gene signature. The 7-gene signature generated significant prognostication specific to smoking lung cancer patients (log-rank P<0.05, Kaplan-Meier analyses), with implications in diagnostic screening of lung cancer risk in smokers (overall accuracy=74%, P<0.006). The coexpression patterns derived from the implication networks in both applications were successfully validated with molecular interactions reported in the literature (FDR<0.1).;Our studies demonstrated that hybrid systems with multiple gene selection layers outperform traditional methods. Moreover, implication networks could efficiently model genome-scale disease-mediated coexpression networks and crosstalk with signaling pathways, leading to the identification of clinically important gene signatures.