Abstract:
Student retention to successful graduation is an enduring issue in higher education. National
statistics indicate most higher education institutions have four-year degree completion
rates around 50%, or just half of their student populations. While there are prediction
models which illuminate factors that improve chances of student success, research has yet
to clearly identify interventions that support course selections on a semester-to-semester
basis. In this thesis, we highlight the potential of an ambitious academic advising program
to improve student retention and learning outcomes. Given the complex demands of such
an advising program, we posit that the development of an intelligent automated advising
system is essential to its success. To further this goal, we develop a system to predict
students' grades in the courses they will enroll in during the next enrollment term. We
take a data-driven approach, learning patterns from historical transcript data coupled with
additional information about students, courses, and the instructors teaching them.
We explore a variety of classic and state-of-the-art techniques which have proven effective
for recommendation tasks in the e-commerce domain. In our experiments, Factorization
Machines (FM), Random Forests (RF), and the Personalized Multi-Linear Regression
(PLMR) model achieve the lowest prediction error. We introduce a novel feature selection
technique that is key to the predictive success and interpretability of the FM. By comparing
feature importance across populations and across models, we uncover strong connections
between instructor characteristics and student performance. We also discover key differences
between transfer and non-transfer students. Ultimately we find that a hybrid FM-RF
method can be used to accurately predict grades for both new and returning students
taking both newly introduced and well-established courses.
Unlike most e-commerce recommendations, academic advising can often have longlasting
impacts on the student, on institutions, and on society. National studies show
students with a Bachelors degree earn an average of 62% more than those with a high
school diploma. Meanwhile, students who start but fail to successfully complete a university
degree program represent lost revenue for institutions and generate debt that burdens
society; the cumulative losses from dropouts across the United States figure in the billions.
In such high-impact advising scenarios, explainability of recommendations becomes
essential for their adoption. To address this concern, we explored probabilistic techniques
that compete with state-of-the-art methods but yield superior prediction explanations. In
particular, we develop a novel method called Profiling Mixtures of Linear Regressions that
matches the performance of PLMR. We derive an efficient Gibbs sampling inference algorithm
to infer a full posterior distribution for this model. We then demonstrate through a
variety of informative visualizations how this posterior distribution can be used to assist advisors
in making academic degree planning recommendations that are clear and actionable
for advisees.
The work in this thesis represents progress towards a truly intelligent degree planning
system. Development of such a system holds promise for student degree planning, instructor
interventions, and personalized advising, each of which could improve retention and
student learning outcomes. Given the billion-dollar nature of the retention problem, successful
application of the techniques in this thesis will bring significant gains for individuals,
institutions, and society.