Comparison of Naïve Bayes and Random Forest Models in Predicting Undergraduate Study Duration Classification at the University of Lampung

Main Article Content

Shelvira Hestina P.
Widiarti
Aang Nuryaman
Mustofa Usman

Abstract

This study aims to compare the performance of the Naïve Bayes and Random Forest classification algorithms in predicting the study duration of undergraduate students in the Mathematics Study Program at the University of Lampung. The dataset consists of 537 graduation records from 2020–2024. The research steps include data preprocessing, data partitioning (train-test split and k-fold cross validation), model building, and evaluation using a confusion matrix. The results show that the Random Forest algorithm achieved the highest accuracy of 94.44%, outperforming Naïve Bayes which reached a maximum accuracy of 92.59%. These findings suggest that Random Forest is more effective for classifying student study durations. These findings suggest that Random Forest is more effective for classifying student study durations.

Article Details

Section
Articles

References

[1] J. P. Pêgo, V. L. Miguéis, and A. Soeiro. Students’ complex trajectories: Exploring degree change and time to degree. International Journal of Educational Technology in Higher Education, 21(1):5, 2024.

[2] A. O. Oyedeji, A. M. Salami, O. Folorunsho, and R. D. Ojerinde. Analysis and prediction of student academic performance using machine learning. Journal of Computer Engineering and Intelligent Systems, 11(2):21–29, 2020.

[3] R. Umer, T. Susnjak, A. Mathrani, and S. Hill. Current stance on predictive analytics in higher education: Opportunities, challenges and future directions. Interactive Learning Environments, pages 1–19, 2023.

[4] C. Romero and S. Ventura. Educational data mining and learning analytics: An updated survey. arXiv preprint, pages 49–56, 2024.

[5] Y. Sun, Y. Liu, J. Zhang, and H. Yu. Multi-source data fusion and ensemble learning model for early warning of college student dropout. IEEE Access, 8:149165–149177, 2020.

[6] Aurélien Géron. Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow. O’Reilly Media, 3rd edition, 2022.

[7] Sebastian Raschka and Vahid Mirjalili. Python Machine Learning. Packt Publishing, 3rd edition, 2022.

[8] Daniel Berrar. Bayes’ theorem and naive Bayes classifier. The Open University, 2025. The Open University.

[9] K. Roy and D. M. Farid. An adaptive feature selection algorithm for student performance prediction. IEEE Access, 12:55678–55689, 2024.

[10] Matthias Schonlau and Renhai Y. Zou. The random forest algorithm for statistical learning. The Stata Journal, 20(1):3–29, 2020.

[11] M. Wang. Stacking ensemble model for liver stiffness classification with imbalanced data. Doctoral dissertation, ProQuest Dissertations Publishing, 2021.

[12] R. Bakri, N. P. Astuti, and A. S. Ahmar. Evaluating random forest algorithm in educational data mining: Optimizing graduation on-time prediction using imbalance methods. ARRUS Journal of Social Sciences and Humanities, 4(1):108–116, 2024.

[13] D. Kurniasari, R. N. Hidayah, Notiragayu, Warsono, and R. K. Nisa. Classification models for academic performance: A comparative study of naïve bayes and random forest algorithms in analyzing University of Lampung student grades. Jurnal Teknik Informatika (JUTIF), 5(5):1853–1861, 2024.

[14] M. B. Hartanto, T. Destanto, Y. Yuniarthe, and T. Winarko. Implementation of data mining for classifying student graduation levels using naïve bayes, decision tree, random forest, support vector machines and neural networks methods. CCIT Journal, 18(1):80–87, 2024.

[15] V. Nakhipova, Y. Kerimbekov, Z. Umarova, M. Abishev, and D. Tsoy. Use of the naive bayes classifier algorithm in machine learning for student performance prediction. International Journal of Information and Education Technology, 14(2):162–167, 2024.

[16] S. Farhana. Classification of academic performance for university research evaluation by implementing modified naive bayes algorithm. Procedia Computer Science, 192:1176–1185, 2021.

[17] O. B. Akanbi. Application of naive bayes to students’ performance classification. Asian Journal of Probability and Statistics, 21(4):20–28, 2023.

[18] A. L. J. Martinez, K. Sood, and R. Mahto. Early detection of at-risk students using machine learning. In Proceedings of the World Congress in Computer Science, pages 45–57, 2025.

[19] E. Ahmed. Student performance prediction using machine learning algorithms. Applied Computational Intelligence and Soft Computing, 2024.

[20] J. M. Aiken, R. De Bin, M. Hjorth-Jensen, and M. D. Caballero. Predicting time to graduation at a large enrollment American university. PLOS ONE, 15(8):28–35, 2020.