Eberly College of Arts and Sciences
Physics and Astronomy
The use of machine learning and data mining techniques across many disciplines has exploded in recent years with the field of educational data mining growing significantly in the past 15 years. In this study, random forest and logistic regression models were used to construct early warning models of student success in introductory calculus-based mechanics (Physics 1) and electricity and magnetism (Physics 2) courses at a large eastern land-grant university. By combining in-class variables such as homework grades with institutional variables such as cumulative GPA, we can predict if a student will receive less than a “B” in the course with 73% accuracy in Physics 1 and 81% accuracy in Physics 2 with only data available in the first week of class using logistic regression models. The institutional variables were critical for high accuracy in the first four weeks of the semester. In-class variables became more important only after the first in-semester examination was administered. The student’s cumulative college GPA was consistently the most important institutional variable. Homework grade became the most important in-class variable after the first week and consistently increased in importance as the semester progressed; homework grade became more important than cumulative GPA after the first in-semester examination. Demographic variables including gender, race or ethnicity, and first generation status were not important variables for predicting course grade.
Digital Commons Citation
Zabriskie, Cabot; Yang, Jie; DeVore, Seth; and Stewart, John, "Using Machine Learning to Predict Physics Course Outcomes" (2019). Faculty & Staff Scholarship. 1848.
Zabriskie, C., Yang, J., DeVore, S., & Stewart, J. (2019). Using machine learning to predict physics course outcomes. Physical Review Physics Education Research, 15(2). https://doi.org/10.1103/physrevphyseducres.15.020120