Wrapper Feature Selection Method for Predicting Student Dropout in Higher Education
Main Article Content
Abstract
Background of Study: Student dropout in higher education is influenced by a variety of factors including demographic, socioeconomic, macroeconomic, admission-related, and academic performance data. Accurately identifying students at risk of dropping out is a significant challenge within educational data mining (EDM), especially when working with large, complex datasets.
Aims and Scope of Paper: This study aims to identify an optimal subset of features that can improve the accuracy of student dropout prediction. The scope includes comparing the effectiveness of different machine learning algorithms combined with a heuristic-based feature selection method to find the best-performing model.
Methods: A Wrapper-based feature selection approach was employed using Ant Colony Optimization (ACO) as the search strategy. ACO was integrated with five classifiers—Random Forest (RF), Logistic Regression (LR), K-Nearest Neighbors (KNN), Support Vector Machine (SVM), and Neural Network (NN)—to select the most relevant feature subsets. The performance of each combination was evaluated and compared.
Result: The study found that ACO combined with Random Forest (ACO-RF) outperformed the other combinations in feature selection effectiveness. The selected features were then validated using various machine learning algorithms and a neural network. Among them, the neural network achieved the highest accuracy of 93%.
Conclusion: The proposed ACO-RF wrapper method is an effective feature selection strategy for predicting student dropout in higher education. The method enhances model performance, especially when used with neural networks, and offers a promising approach for early identification of at-risk students.
Downloads
Article Details
Copyright (c) 2025 Anuradha Singh, S. Karthikeyan

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
References
Alban, M., & Mauricio, D. (2019). Predicting University Dropout trough Data Mining: A systematic Literature. Indian Journal of Science and Technology, 12(4), 1–12. https://doi.org/10.17485/ijst/2019/v12i4/139729
Aleem, A., & Gore, M. M. (2020). Educational data mining methods: A survey. Proceedings - 2020 IEEE 9th International Conference on Communication Systems and Network Technologies, CSNT 2020, 182–188. https://doi.org/10.1109/CSNT48778.2020.9115734
Antony Gnana Singh, D. A., Leavline, E. J., Priyanka, R., & Priya, P. P. (2016). Dimensionality reduction using genetic algorithm for improving accuracy in medical diagnosis. International Journal of Intelligent Systems and Applications, 8(1), 67–73. https://doi.org/10.5815/ijisa.2016.01.08
Aulck, L., Velagapudi, N., Blumenstock, J., & West, J. (2016). Predicting Student Dropout in Higher Education. ArXiv, 1(4), 16–20. https://doi.org/10.48550/arXiv.1606.06364
Cheng, Y., Pereira Nunes, B., & Manrique, R. (2022). Not Another Hardcoded Solution to the Student Dropout Prediction Problem: A Novel Approach Using Genetic Algorithms for Feature Selection. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). https://doi.org/10.1007/978-3-031-09680-8_23
Del Bonifro, F., Gabbrielli, M., Lisanti, G., & Zingaro, S. P. (2020). Student dropout prediction. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 12163 LNAI, 129–140. https://doi.org/10.1007/978-3-030-52237-7_11
Febro, J. D. (2019). Utilizing feature selection in identifying predicting factors of student retention. International Journal of Advanced Computer Science and Applications, 10(9), 269–274. https://doi.org/10.14569/ijacsa.2019.0100934
Gardner, J., & Brooks, C. (2018). Dropout model evaluation in MOOCs. The Thirty-Ninth AAAI Conference on Artificial Intelligence, 32(2), 7906–7912. https://doi.org/10.1609/aaai.v32i1.11392
Kadar, M., Guevara, J. C., Sarraipa, J., & Gutiérrezy Restrepo, E. (2018). An integrated approach for fighting dropout and enhancing students’ satisfaction in higher education. ACM International Conference Proceeding Series, 240–247. https://doi.org/10.1145/3218585.3218667
Li, H., Lynch, C. F., & Barnes, T. (2018). Early Prediction of Course Grades: Models and Feature Selection. Journal Reference: The Proceedings of the 11th International Conference on Educational Data Mining (EDM 2018), 492–495. https://doi.org/10.48550/arXiv.1812.00843
Mukhlif, Y. A., Ramaha, N. T. A., Hameed, A. A., Salman, M., Yon, D. K., Fitriyani, N. L., Syafrudin, M., & Lee, S. W. (2024). Ant Colony and Whale Optimization Algorithms Aided by Neural Networks for Optimum Skin Lesion Diagnosis: A Thorough Review. Mathematics, 12(7), 1–29. https://doi.org/10.3390/math12071049
Nicoletti, M. do C. (2019). Revisiting the Tinto's Theoretical Dropout Model. Higher Education Studies, 9(3), 52. https://doi.org/10.5539/hes.v9n3p52
Nuanmeesri, S., Poomhiran, L., Chopvitayakun, S., & Kadmateekarun, P. (2022). Improving Dropout Forecasting during the COVID-19 Pandemic through Feature Selection and Multilayer Perceptron Neural Network. International Journal of Information and Education Technology, 12(9), 851–857. https://doi.org/10.18178/ijiet.2022.12.9.1693
Realinho, V., Machado, J., Baptista, L., & Martins, M. V. (2022). Predicting Student Dropout and Academic Success. Data, 7(11), 1–17. https://doi.org/10.3390/data7110146
Setiadi, H., Larasati, I. P., Suryani, E., & Wardani, D. W. (2024). Comparing Correlation-Based Feature Selection and Symmetrical. Jurnal RESTI, 5(2), 542–554. https://doi.org/10.29207/resti.v8i4.5911
Singh, A. K., & Karthikeyan, S. (2024). Heuristic Technique to Find Optimal Learning Rate of LSTM for Predicting Student Dropout Rate. Communications in Computer and Information Science, 2151 CCIS(May), 47–54. https://doi.org/10.1007/978-3-031-64312-5_6
Siri, A. (2015). Predicting students’ academic dropout using artificial neural networks. Predicting Students’ Academic Dropout Using Artificial Neural Networks, 7(June), 1–159. https://doi.org/10.14658/PUPJ-IJSE-2015-2-9
Süpürtülü, M., Hatipoğlu, A., & Yılmaz, E. (2025). An Analytical Benchmark of Feature Selection Techniques for Industrial Fault Classification Leveraging Time-Domain Features. Applied Sciences (Switzerland), 15(3), 1–25. https://doi.org/10.3390/app15031457
Turabieh, H., Azwari, S. Al, Rokaya, M., Alosaimi, W., Alharbi, A., Alhakami, W., & Alnfiai, M. (2021). Enhanced Harris Hawks optimization as a feature selection for the prediction of student performance. Computing, 103(7), 1417–1438. https://doi.org/10.1007/s00607-020-00894-7
Villa-Blanco, C., Bielza, C., & Larrañaga, P. (2023). Feature subset selection for data and feature streams: a review. Artificial Intelligence Review, 56, 1011–1062. https://doi.org/10.1007/s10462-023-10546-9
Wan Yaacob, W. F., Mohd Sobri, N., Nasir, S. A. M., Wan Yaacob, W. F., Norshahidi, N. D., & Wan Husin, W. Z. (2020). Predicting Student Drop-Out in Higher Institution Using Data Mining Techniques. Journal of Physics: Conference Series, 1496(1), 1–15. https://doi.org/10.1088/1742-6596/1496/1/012005
Xiao, W., Ji, P., & Hu, J. (2021). RnkHEU: A Hybrid Feature Selection Method for Predicting Students’ Performance. Scientific Programming, 1, 1–16. https://doi.org/10.1155/2021/1670593
Youssef, M., Mohammed, S., Hamada, E. K., & Wafaa, B. F. (2019). A predictive approach based on efficient feature selection and learning algorithms’ competition: Case of learners’ dropout in MOOCs. Education and Information Technologies, 24(6), 3591–3618. https://doi.org/10.1007/s10639-019-09934-y
Zahedi, L., Ghareh Mohammadi, F., & Amini, M. H. (2022). A2BCF: An Automated ABC-Based Feature Selection Algorithm for Classification Models in an Education Application. Applied Sciences (Switzerland), 12(7), 1–16. https://doi.org/10.3390/app12073553