Date of Graduation


Document Type


Degree Type



Statler College of Engineering and Mineral Resources


Not Listed

Committee Chair

Donald Adjeroh

Committee Co-Chair

Yanfang Ye

Committee Member

Lee Pyles


Chronic Kidney Disease (CKD) is the leading cause for kidney failure. It is a global health problem affecting approximately 10% of the world population and about 15% of US adults. Chronic Kidney Diseases do not generally show any disease specific symptoms in early stages thus it is hard to detect and prevent such diseases. Early detection and classification are the key factors in managing Chronic Kidney Diseases.

In this thesis, we propose a new machine learning technique for Kidney Ailment Prediction. We focus on two key issues in machine learning, especially in its application to disease prediction. One is related to class imbalance problem. This occurs when at least one of the classes are represented by significantly smaller number of samples than the others in the training set. The problem with imbalanced dataset is that the classifiers tend to classify all samples as majority class, ignoring the minority class samples. The second issue is on the specific type of data to be used for a given problem. Here, we focused on predicting kidney diseases based on patient information extracted from laboratory and questionnaire data. Most recent approaches for predicting kidney diseases or other chronic diseases rely on the usage of prescription drugs. In this study, we focus on biomarker and anthropometry data of patients to analyze and predict kidney-related diseases.

In this research, we adopted a learning approach which involves repeated random data sub-sampling to tackle the class imbalance problem. This technique divides the samples into multiple sub-samples, while keeping each training sub-sample completely balanced. We then trained classification models on the balanced data to predict the risk of kidney failure. Further, we developed an intelligent fusion mechanism to combine information from both the biomarker and anthropometry data sets for improved prediction accuracy and stability. Results are included to demonstrate the performance.