Date of Graduation
2017
Document Type
Thesis
Degree Type
MS
College
School of Public Health
Department
Epidemiology
Committee Chair
Michael Regier
Committee Co-Chair
Roger D Parker
Committee Member
Sijin Wen
Abstract
Missing data is a common problem encountered in statistical analysis. However, little is known about how bias inducing missing at random missing data mechanisms affect predictive model performance measures such as sensitivity, specificity, error rate, ROC curves, and AUC. I investigate the effect of missing at random missing data mechanisms on a single layer artificial neural network with a sigmoidal activation function, equivalent to a binary logistic regression. Binary logistic regression is frequently used in health research and so it is a logical starting point to understand the effects of missing data on statistical learning models that could be used in health research. I then examine whether multiple imputation is a useful analytic correction for improving the predictive model performance measures relative to performing a complete case analysis.;Two simulation studies are conducted to understand how the complexity of the missing data mechanism, type of covariate missing, and rate of missing values affect the measures of interest and whether multiple imputation is robust to the various scenarios investigated. It was found that sensitivity, specificity, and error rate estimates were biased for all scenarios and the magnitude of bias increased as the missing rate increased. However, the AUC remained unbiased. Multiple imputation was observed to be an effective correction for missing values by decreasing the bias of the performance measures relative to the complete case analysis.;I conclude that missing at random missing data mechanisms do affect performance measures such as sensitivity, specificity, and error rate estimates, but multiple imputation is a useful analytic correction for reducing the bias of these measures. It is advised that caution should be taken when reporting AUC and it should be reported alongside other measures such as sensitivity and specificity.
Recommended Citation
Dick, Taron, "The Effect of a Missing at Random Missing Data Mechanism on a Single Layer Artificial Neural Network with a Sigmoidal Activation Function and the Use of Multiple Imputation as a Correction" (2017). Graduate Theses, Dissertations, and Problem Reports. 5493.
https://researchrepository.wvu.edu/etd/5493