Author

Taron Dick

Date of Graduation

2017

Document Type

Thesis

Degree Type

MS

College

School of Public Health

Department

Epidemiology

Committee Chair

Michael Regier

Committee Co-Chair

Roger D Parker

Committee Member

Sijin Wen

Abstract

Missing data is a common problem encountered in statistical analysis. However, little is known about how bias inducing missing at random missing data mechanisms affect predictive model performance measures such as sensitivity, specificity, error rate, ROC curves, and AUC. I investigate the effect of missing at random missing data mechanisms on a single layer artificial neural network with a sigmoidal activation function, equivalent to a binary logistic regression. Binary logistic regression is frequently used in health research and so it is a logical starting point to understand the effects of missing data on statistical learning models that could be used in health research. I then examine whether multiple imputation is a useful analytic correction for improving the predictive model performance measures relative to performing a complete case analysis.;Two simulation studies are conducted to understand how the complexity of the missing data mechanism, type of covariate missing, and rate of missing values affect the measures of interest and whether multiple imputation is robust to the various scenarios investigated. It was found that sensitivity, specificity, and error rate estimates were biased for all scenarios and the magnitude of bias increased as the missing rate increased. However, the AUC remained unbiased. Multiple imputation was observed to be an effective correction for missing values by decreasing the bias of the performance measures relative to the complete case analysis.;I conclude that missing at random missing data mechanisms do affect performance measures such as sensitivity, specificity, and error rate estimates, but multiple imputation is a useful analytic correction for reducing the bias of these measures. It is advised that caution should be taken when reporting AUC and it should be reported alongside other measures such as sensitivity and specificity.

Share

COinS