Date of Graduation


Document Type


Degree Type



Statler College of Engineering and Mineral Resources


Lane Department of Computer Science and Electrical Engineering

Committee Chair

Nancy Lan Guo

Committee Co-Chair

Donald Adjeroh

Committee Member

Donald Adjeroh

Committee Member

Saiph Savage


Lung cancer is the leading cause of cancer-related death in the world. Lung cancer can be categorized as non-small cell lung cancer (NSCLC) and small cell lung cancer (SCLC). NSCLC makes up about 80% to 85% of lung cancer cases diagnosed, whereas SCLC is responsible for 10% to 15% of the cases. It remains a challenge for physicians to identify patients who shall benefit from chemotherapy. In such a scenario, identifying genes that can facilitate therapeutic target discoveries and better understanding disease mechanisms and their regulation in different stages of lung cancer, remains an important topic of research.

In this thesis, we develop a computational framework for modelling molecular gene interaction networks, called Genet-CNV, to analyse gene interactions based on DNA Copy Number Variations (CNV). DNA copy number variation is a phenomenon in which sections of the genome are repeated and the number of repeats in the genome varies between individuals in the human population. These variations can be used to study the activity of genes in cancerous cells, compared with that of the normal population. Genet-CNV uses Boolean implication networks to investigate genome-wide DNA CNV to identify relationships called rules, that could potentially lead to the identification of genes of significant biological interest. Boolean implication networks are probabilistic graphical models that express the relationship between two variables terms of six implication rules that can describe if the genes are co-amplified, co-deleted or differentially amplified and deleted. Genet-CNV is run on three publicly available NSCLC genomic datasets. We further evaluate the results obtained with Genet-CNV by comparing them with the benchmark dataset, The Molecular Signatures Database (MSigDB). We identified several genes of interest that are present in survival, apoptosis, proliferation and immunologic pathways. The relationships obtained from this analysis can be tested for biological validations, or to confirm experimental results, thus facilitating the identification of genes playing a significant role in the causation and progress of NSCLC.