Author ORCID Identifier
Semester
Fall
Date of Graduation
2025
Document Type
Thesis (Campus Access)
Degree Type
MS
College
Statler College of Engineering and Mineral Resources
Department
Industrial and Managements Systems Engineering
Committee Chair
Avishek Choudhury
Committee Member
Ashish Nimbarte
Committee Member
JuHeyong Ryu
Abstract
Cancer has become one of the most significant contributors to the global burden of disease, accounting for nearly 10 million deaths annually. Despite progress in reducing overall cancer mortality, the rising incidence of several major cancer types, combined with persistent racial, ethnic, and geographic disparities, highlights that cancer remains a pressing and evolving threat to public health in the USA. Environmental pollution (air, water, and land) has been recognized as a major determinant of cancer risk, yet most studies continue to examine pollutants in isolation or rely on broad cumulative indices that obscure the role of individual exposures. This lack of a combined study that also captures individual aspects limits the understanding of how diverse environmental and social factors together shape cancer prevalence at a national level.
The main objective of this thesis is to evaluate how diverse environmental factors contribute to cancer prevalence across the USA, using an approach that considers them together in a single model while preserving their distinct individual effects. To achieve this, various parametric and nonparametric models, including binomial logistic regression, random forest classifier, gradient boosting, and artificial neural network, were formulated and compared to identify the best-performing model on the EJScreen and HDLP2020 datasets.
An average performance was observed among the formulated models, with the best performance from the tuned Random Forest Model. This model achieved an AUC-ROC of 0.691 with a sensitivity of 0.65 and a specificity of 0.67. This moderate performance was consistent across all tested models. The Shapley additive explanations analysis on the random forest model identified older age, people of color, and unemployment as the dominant predictive factors, with secondary impacts from smoking and fine particulate matter levels. The social and demographic covariates were found to be more dominant predictors than the pollution-related covariates in the final model. This shows that the social determinants are closely intertwined with pollution exposures and are crucial in determining cancer prevalence. This suggests the need for more multifaceted policies that are able to address both pollution and socio-economic challenges to reduce cancer prevalence in the USA.
Recommended Citation
Sharma Timilsina, Sagar, "Studying the Association Between Environmental Pollution and Cancer Prevalence in the USA" (2025). Graduate Theses, Dissertations, and Problem Reports. 13108.
https://researchrepository.wvu.edu/etd/13108