Predicting Student Performance on Virtual Learning Environment

Alnassar, Fatema Mohammad. 2023. Predicting Student Performance on Virtual Learning Environment. Doctoral thesis, Goldsmiths, University of London [Thesis]
Copy

Virtual learning has gained increased importance because of the recent pandemic situation. A mass shift to virtual means of education delivery has been observed over the past couple of years, forcing the community to develop efficient performance assessment tools. Prediction of students performance using different relevant information has emerged as an efficient tool in educational institutes towards improving the curriculum and teaching methodologies. Automated analysis of educational data using state of the art Machine Learning (ML) and Artificial Intelligence (AI) algorithms is an active area of research.

The research presented in this thesis addresses the problem of students performance prediction comprehensively by applying multiple machine learning models (i.e., Multilayer Perceptron (MLP), Decision Tree (DT), Random Forest (RF), Extreme Gradient Boosting (XGBoost), CATBoost, K-Nearest Neighbour (KNN) and Support Vector Classifier (SVC)) on the two benchmark VLE datasets (i.e., Open University Learning Analytics Dataset (OULAD), Coursera). In this context, a series of experiments are performed and important insights are reported. First, the classification performance of machine learning models has been investigated on both OULAD and Coursera datasets. In the second experiment, performance of machine learning models is studied for each course of Coursera dataset and comparative analysis are performed. From the Experiment 1 and Experiment 2, the class imbalance is reported as the highlighted factor responsible for degraded performance of machine learning models. In this context, Experiment 3 is designed to address the class imbalance problem by making use of multiple Synthetic Minority Oversampling Technique (SMOTE) and generative models (i.e., Generative Adversial Networks (GANs)). From the results, SMOTE NN approach was able to achieve best classification performance among the implemented SMOTE techniques. Further, when mixed with generative models, the SMOTENN-GAN generated Coursera dataset was the best on which machine learning models were able to achieve the classification accuracy around 90%. Overall, MLP, XGBoost and CATBoost machine learning models were emerged as the best performing in context to different experiments performed in this thesis.


picture_as_pdf
COM_thesis_AlnassarF_2023.pdf
subject
Accepted Version
Available under Creative Commons: Attribution-NonCommercial-No Derivative Works 4.0

View Download

Atom BibTeX OpenURL ContextObject in Span OpenURL ContextObject Dublin Core Dublin Core MPEG-21 DIDL Data Cite XML EndNote HTML Citation METS MODS RIOXX2 XML Reference Manager Refer ASCII Citation
Export

Downloads