Graduate Theses, Dissertations, and Problem Reports

Application of Interpretable Machine Learning Methods to Study the Disease Characteristics and Healthcare Expenditures in Hodgkin’s Lymphoma

Zasim Azhar Dil Hasan SiddiquiFollow

Author ORCID Identifier

https://orcid.org/0000-0002-6719-3228

Semester

Spring

Date of Graduation

2024

Document Type

Dissertation

Degree Type

PhD

College

School of Pharmacy

Department

Pharmaceutical Systems and Policy

Committee Chair

Sabina Nduaguba

Committee Member

Usha Sambamoorthi

Committee Member

Virgina G. Scott

Committee Member

Traci LeMasters

Committee Member

Jay S. Patel

Abstract

Hodgkin’s lymphoma (HL) is a rare malignancy of lymphocytes that predominantly occurs in young adults aged 20-30 years or elderly individuals aged 65-75 years. Despite its low incidence, there were an estimated 223,512 HL survivors in the US in 2020. Hodgkin’s lymphoma shows a favorable prognosis among young adults, with a high cure rate of 85-90%; however, older adults experience poor prognosis, with a 5-year overall survival rate of 40-55% in patients over 60 years. HL survivors incur high total and out-of-pocket (OOP) healthcare expenditures, averaging $78,183 and $4,180 per patient in the first year after diagnosis, highlighting a considerable economic burden. Despite the high economic burden, the existing HL cost-related literature has primarily focused on young adults, overlooking the costs among older adults. Healthcare costs among older HL survivors could be substantial due to their poor prognosis and complexity arising from factors such as comorbidities and adjusted chemotherapy regimens. Machine learning (ML) methods are increasingly used in healthcare cost predictions. However, they pose challenges, including algorithmic bias, particularly affecting underprivileged demographics such as females, non-whites, and individuals from lower economic status. Given HL’s low incidence and its prevalence among young adults, who often use the internet and social media for obtaining health-related information, we adopt a data-driven approach to leverage claims and social media data to address gaps in the literature on health expenditures among older HL patients and explore the feasibility of using social media to study HL. We aim to achieve these objectives through three related research aims: 1. Determine the leading predictors of Medicare and OOP healthcare expenditures in older HL survivors across different phases of cancer care using interpretable ML methods. 2. Assess the fairness of ML models in predicting health expenditures in HL patients based on their sensitive attributes- sex, race, and economic status. 3. To assess the feasibility of using social media data to study the disease and treatment characteristics of HL. We used a retrospective research design, utilizing data from multiple sources to address aims 1 and 2. We used the Surveillance, Epidemiology, and End Results (SEER) data linked with the fee-for-service (FFS) Medicare claims with a primary diagnosis of incident HL between 2009 and 2017, with a two-year baseline and follow-up period. Along with SEER-Medicare data, we incorporated geographical information from SEER census and zip code files and publicly available data from Area Health Resource File (AHRF) and County Health Ranking File (CHRF). We employed multiple ML models for analysis, including linear regression, random forest, and XGBoost. Additionally, we used Shapley Additive exPlanations (SHAP) values to determine the contribution of each feature to the model’s prediction. Model fairness was assessed using the group fairness matrix, which assesses independence, separation, and sufficiency. We also examined individual fairness through counterfactual analysis- the Flip Test, which assesses the model performance by altering sensitive attributes from privileged to underprivileged to test model performances. For aim 3, we analyzed data from the X platform spanning January 2010 to October 2022, extracting and identifying pre-defined classes and attributes related to HL using Named Entity Recognition (NER) Natural Language Processing (NLP) techniques. Our findings showed high Medicare and OOP healthcare expenditures among HL survivors during the pre-diagnosis, treatment, and post-treatment phases. The XGBoost outperformed other models for predicting Medicare and OOP expenditure, with the interpretable ML methods highlighting baseline expenditures and chronic conditions as the leading predictors in the pre-diagnosis phase. In contrast, chemotherapy, immunotherapy, and surgery appeared as leading predictors of expenditures during the treatment and post-treatment phases. Our fairness assessment showed varying model accuracy by sensitive attributes, yet model predominantly remained fair in the group and individual fairness assessments. Aim 3 findings indicated high NER performance, with accuracy (86%) and F1 score (87%) in extracting HL-related classes and attributes from the free text in the posts, demonstrating the potential of X as a valuable preliminary research source in rare diseases such as Hodgkin’s Lymphoma.

Recommended Citation

Siddiqui, Zasim Azhar Dil Hasan, "Application of Interpretable Machine Learning Methods to Study the Disease Characteristics and Healthcare Expenditures in Hodgkin’s Lymphoma" (2024). Graduate Theses, Dissertations, and Problem Reports. 12370.
https://researchrepository.wvu.edu/etd/12370

Download

Included in

Pharmacoeconomics and Pharmaceutical Economics Commons, Pharmacy Administration, Policy and Regulation Commons

COinS

DOI

http://doi.org/10.33915/etd.12370

Graduate Theses, Dissertations, and Problem Reports

Application of Interpretable Machine Learning Methods to Study the Disease Characteristics and Healthcare Expenditures in Hodgkin’s Lymphoma

Author ORCID Identifier

Semester

Date of Graduation

Document Type

Degree Type

College

Department

Committee Chair

Committee Member

Committee Member

Committee Member

Committee Member

Abstract

Recommended Citation

Included in

DOI

Browse

Resources

Search

Author Corner

Graduate Theses, Dissertations, and Problem Reports

Application of Interpretable Machine Learning Methods to Study the Disease Characteristics and Healthcare Expenditures in Hodgkin’s Lymphoma

Author

Author ORCID Identifier

Semester

Date of Graduation

Document Type

Degree Type

College

Department

Committee Chair

Committee Member

Committee Member

Committee Member

Committee Member

Abstract

Recommended Citation

Included in

Share

DOI

Browse

Resources

Search

Author Corner