Semester

Fall

Date of Graduation

2019

Document Type

Thesis

Degree Type

MS

College

Statler College of Engineering and Mineral Resources

Department

Not Listed

Committee Chair

Donald Adjeroh

Committee Co-Chair

Yanfang Ye

Committee Member

Lee Pyles

Abstract

Chronic Kidney Disease (CKD) is the leading cause for kidney failure. It is a global health problem affecting approximately 10% of the world population and about 15% of US adults. Chronic Kidney Diseases do not generally show any disease specific symptoms in early stages thus it is hard to detect and prevent such diseases. Early detection and classification are the key factors in managing Chronic Kidney Diseases.

In this thesis, we propose a new machine learning technique for Kidney Ailment Prediction. We focus on two key issues in machine learning, especially in its application to disease prediction. One is related to class imbalance problem. This occurs when at least one of the classes are represented by significantly smaller number of samples than the others in the training set. The problem with imbalanced dataset is that the classifiers tend to classify all samples as majority class, ignoring the minority class samples. The second issue is on the specific type of data to be used for a given problem. Here, we focused on predicting kidney diseases based on patient information extracted from laboratory and questionnaire data. Most recent approaches for predicting kidney diseases or other chronic diseases rely on the usage of prescription drugs. In this study, we focus on biomarker and anthropometry data of patients to analyze and predict kidney-related diseases.

In this research, we adopted a learning approach which involves repeated random data sub-sampling to tackle the class imbalance problem. This technique divides the samples into multiple sub-samples, while keeping each training sub-sample completely balanced. We then trained classification models on the balanced data to predict the risk of kidney failure. Further, we developed an intelligent fusion mechanism to combine information from both the biomarker and anthropometry data sets for improved prediction accuracy and stability. Results are included to demonstrate the performance.

Share

COinS