Date of Graduation


Document Type


Degree Type



School of Pharmacy


Pharmaceutical Systems and Policy

Committee Chair

Usha Sambamoorthi

Committee Co-Chair

Nilanjana Dwibedi

Committee Member

Traci J. LeMasters

Committee Member

Ranjita Misra

Committee Member

Danielle E. Rose


There is robust evidence that heart failure (HF) is associated with substantial mortality, morbidity, poor health-related quality of life, healthcare utilization, and economic burden. Previous research has revealed that there are sex differences in the epidemiology, etiology, and disease burden of HF. However, research on HF among women, especially postmenopausal women, is limited. To fill the knowledge gap, the three related aims of this dissertation were to: (1) identify knowledge gaps in HF research among women, especially postmenopausal women, using unsupervised machine learning methods and big data (i.e., articles published in PubMed); (2) identify emerging predictors (i.e., polypharmacy and some prescription medications) of incident HF among postmenopausal women using supervised machine learning methods; (3) identify leading predictors of HF-related emergency room use among postmenopausal women using supervised machine learning methods with data from a large commercial insurance claims database in the United States. This study utilized machine learning methods. In the first aim, non-negative matrix factorization algorithms were used to cluster HF articles based on the primary topic. Clusters were independently validated and labeled by three investigators familiar with HF research. The most understudied area among women was atrial fibrillation. Among postmenopausal women, the most understudied topic was stress-induced cardiomyopathy. For the second and third aims, a retrospective cohort design and Optum’s de-identified Clinformatics® Data Mart Database (Optum, Eden Prairie, MN), de-identified health insurance claims data, were used. In the second aim, multivariable logistic regression and three classification machine learning algorithms (cross-validated logistic regression (CVLR), random forest (RF), and eXtreme Gradient Boosting (XGBoost) algorithms) were used to identify predictors of incident HF among postmenopausal women. The associations of the leading predictors to incident HF were explored with an interpretable machine learning SHapley Additive exPlanations (SHAP) technique. The eight leading predictors of incident HF consistent across all models were: older age, arrhythmia, polypharmacy, Medicare, chronic obstructive pulmonary disease (COPD), coronary artery disease, hypertension, and chronic kidney disease. Some prescription medications such as sulfonylureas and antibiotics other than fluoroquinolones predicted incident HF in some machine learning algorithms. In the third aim, a random forest algorithm was used to identify predictors of HF-related emergency room use among postmenopausal women. Interpretable machine learning techniques were used to explain the association of leading predictors to HF-related emergency room use. Random forest algorithm had high predictive accuracy in the test dataset (Area Under the Curve: 94%, sensitivity: 93%, specificity: 77%, and accuracy: 0.81). We found that the number of HF-related emergency room visits at baseline, fragmented care, age, insurance type (Health Maintenance Organization), and coronary artery disease were the top five predictors of HF-related emergency room use among postmenopausal women. Partial dependence plots suggested positive associations of the top predictors with HF-related emergency room use. However, insurance type was found to be negatively associated with HF-related emergency room use. Findings from this dissertation suggest that machine learning algorithms can achieve comparable and better predictive accuracy compared to traditional statistical models.

Embargo Reason

Publication Pending

Available for download on Tuesday, August 24, 2021