A summary of the findings of the review process has been presented in Table 2. An attempt has been made to highlight the educational problem or specific domain targeted for cluster-based analysis. The clustering algorithm used to tackle the issue has also been presented along with the data attributes used in the analysis. The findings of each paper have also been summarized.
Observing the temporal view of the studies depicted in Figure 2, we can see a steady focus of research spanning between 2013 to 2023, with an increase in 2017, 2022, and 2023. Although researchers have focused on EDM before the COVID-19 pandemic, some researchers have attributed the increased research in EDM to this outbreak [9][10]. Many educational institutes had to switch to and rely on digital platforms. This also led to an additional influx in the educational data being generated, creating more opportunities to experiment and analyze educational content.
Focusing on the region of the experiments, we can observe from Figure 3 that research is not restricted to a particular region and spans various countries across the globe. However, a large chunk of the research has emerged from India and China. We can also observe from Figure 4 that research based on clustering in education has been slightly more focused on conference publications as compared to journal publications.
From the literature review (Table 2), we can see that several educational problems have been targeted using a wide range of clustering methods in the reviewed literature, with extensive research being carried out on analyzing student performance in a course or a degree program [6][11][12][13][14][15][16][17]. Another educational problem targeted has been to cluster students based on the class of learners [18][19][20][21][22]. Research has also focused on clustering students based on their assignment submission patterns [3], activity in Moodle [1][19][23][16], use of Learning Management Systems [24], and finding patterns through their engagement in a course [10]. The emerging cluster patterns have been used to visualize and refine a learning environment [7], help provide interventions to reduce drop-outs [6], understand student competency in courses [25][15][20], and provide guidance pertaining to careers [26]. Cluster-based research has not only focused on finding patterns in student behavior but also on analyzing teacher patterns [27][28][29]. It is evident from the reviewed literature that the results of cluster-based analysis can successfully be used to establish a framework to not only assess students' performance but also to shape pedagogy.
K-means clustering has been utilized extensively and far more than any other approach in the reviewed literature. Among the reviewed literature, a total of 30 of the 43 reviewed papers (69.76%) have used K-means clustering in their research. From analyzing student performance while submitting assignments [3], monitoring student performance to avoid drop-outs [6], clustering students based on their abilities to suggest future career options [26], clustering students based on their problem-solving abilities [30], clustering online learners’ based on their activities [31], understanding learning styles [10], detecting teacher behavior while assessing students [29], to creating varied graduate profiles [17], and analyzing variance in student performance across different subject categories [20], K-means has been used to explore and tackle varied educational issues. Each of the other clustering approaches has been a focus of three of the reviewed studies with clustering focusing on hierarchical and non-hierarchical approaches being used for understanding student clusters in the subject of mathematics [7], better understanding the influence of educational variables [32], and understanding learning behavior [19]. EM clustering algorithm has been used at a more generalized level in the reviewed studies [1][23].
The last review question focused on the use of parameters in clustering-based EDM research. The most common parameters used in the reviewed papers have been summarized in Table 3. Parameters are an important consideration whenever carrying out EDM research. Performance-based features such as CPGA and internal assessment [1][2][20], assignment marks and marks in a semester [5][20], test scores, mid and final assessment [20][17], data obtained from online platforms comprising of time spent and interaction as well as feedback results [29] and many other features have been used in the reviewed papers. As we can observe from Table 3, the student learning data has been most commonly used in the reviewed literature, featuring in almost 44% of the literature. Data from Learning Management Systems has also been extensively used to cluster students, followed by data collected through surveys and targeted questionnaires.
There are several approaches used in the field of EDM to analyze the performance of students across various domains. From the overall literature review, it is concluded that clustering is widely being utilized to analyze various facets of educational outcomes. Consistent with the findings of Dutt et al., several papers focused on exploring data emerging from e-learning platforms [5]. However, an exploration of data from intelligent tutoring systems still needs further exploration. Another similarity between Dutt et al. and Aulakh et al. was the use of K-means in a majority of the reviewed papers [5][9]. It was observed that clustering combined with other algorithms, especially classification (forming clusters first and then based on revealed characteristics performing further analysis), leads to tailored and better interventions [4][35][11][14].
Although several studies have been conducted to investigate and find patterns in student data, it is essential to employ the resultant findings for improving the educational infrastructure. Future directions in the field of cluster-based analysis can focus on the use of these techniques in conjunction with other data mining techniques such as classification and regression to tailor interventions and shape pedagogical policies. Integration of these approaches in e-learning systems with direct and instant feedback resulting in a dynamic e-learning environment can also be a possible research direction. Longitudinal studies can then be conducted to observe the long-term effects of interventions. There were certain limitations to this study including the limited number of knowledge sources explored for the review. This may have resulted in missing some research on cluster-based analysis that may have been published on some other knowledge source. This leaves room for future studies to explore a wider range of knowledge sources which will lead to a better understanding of the focus of cluster-based analysis in education.
[1] A. Bogarín, C. Romero, R. Cerezo, and M. Sánchez-Santillán, “Clustering for improving Educational process mining,” ACM Int. Conf. Proceeding Ser., pp. 11–15, 2014, doi: 10.1145/2567574.2567604.
[2] S. Wijayanti, Azahari, and R. Andrea, “K-Means cluster analysis for students graduation (case study: STMIK widya cipta dharma),” ACM Int. Conf. Proceeding Ser., vol. Part F129684, pp. 20–23, Jun. 2017, doi: 10.1145/3108421.3108430.
[3] C. Geng, W. Xu, Y. Xu, B. Pientka, and X. Si, “Identifying Different Student Clusters in Functional Programming Assignments: From Quick Learners to Struggling Students,” SIGCSE 2023 - Proc. 54th ACM Tech. Symp. Comput. Sci. Educ., vol. 1, pp. 750–756, Mar. 2023, doi: 10.1145/3545945.3569882.
[4] B. K. Francis and S. S. Babu, “Predicting Academic Performance of Students Using a Hybrid Data Mining Approach,” J. Med. Syst., vol. 43, no. 6, pp. 1–15, Jun. 2019, doi: 10.1007/S10916-019-1295-4/METRICS.
[5] A. Dutt, M. A. Ismail, and T. Herawan, “A Systematic Review on Educational Data Mining,” IEEE Access, vol. 5, pp. 15991–16005, 2017, doi: 10.1109/ACCESS.2017.2654247.
[6] “‘Importance of Data Mining in Higher Education System.’” Accessed: Dec. 19, 2023. [Online]. Available: https://www.researchgate.net/publication/269751502_Importance_of_Data_Mining_in_Higher_Education_System
[7] M. C. Desmarais and F. Lemieux, “Clustering and Visualizing Study State Sequences”, Accessed: Dec. 19, 2023. [Online]. Available: https://www.educationaldatamining.org/EDM2013/papers/rn_paper_33.pdf
[8] B. Kitchenham, “Procedures for Performing Systematic Reviews,” 2004, Accessed: Dec. 19, 2023. [Online]. Available: https://www.inf.ufsc.br/~aldo.vw/kitchenham.pdf
[9] K. Aulakh, R. K. Roul, and M. Kaushal, “E-learning enhancement through educational data mining with Covid-19 outbreak period in backdrop: A review,” Int. J. Educ. Dev., vol. 101, p. 102814, Sep. 2023, doi: 10.1016/J.IJEDUDEV.2023.102814.
[10] P. Nuankaew, P. Nasa-Ngium, and W. S. Nuankaew, “Self-Regulated Learning Styles in Hybrid Learning Using Educational Data Mining Analysis,” ICSEC 2022 - Int. Comput. Sci. Eng. Conf. 2022, pp. 208–212, 2022, doi: 10.1109/ICSEC56337.2022.10049322.
[11] R. Asif, A. Merceron, S. A. Ali, and N. G. Haider, “Analyzing undergraduate students’ performance using educational data mining,” Comput. Educ., vol. 113, pp. 177–194, Oct. 2017, doi: 10.1016/J.COMPEDU.2017.05.007.
[12] Harwati, A. P. Alfiani, and F. A. Wulandari, “Mapping Student’s Performance Based on Data Mining Approach (A Case Study),” Agric. Agric. Sci. Procedia, vol. 3, pp. 173–177, Jan. 2015, doi: 10.1016/J.AASPRO.2015.01.034.
[13] L. Chen, M. Li, and Y. Chen, “Research on Course Score Analysis Based on K-Means Clustering Algorithm,” Proc. - 2022 2nd Asia-Pacific Conf. Commun. Technol. Comput. Sci. ACCTCS 2022, pp. 485–488, 2022, doi: 10.1109/ACCTCS53867.2022.00104.
[14] M. Rahman, M. I. Ahmed, and M. S. Hossain, “Analysis of Student’s Achievement through Educational Data Mining,” Proc. - 2021 Int. Conf. Inf. Syst. Adv. Technol. ICISAT 2021, 2021, doi: 10.1109/ICISAT54145.2021.9678406.
[15] J. Liu, “The Application of K-Means Clustering Algorithm in the Quality Analysis of College English Teaching,” Proc. - 2022 Int. Conf. Educ. Netw. Inf. Technol. ICENIT 2022, pp. 1–4, 2022, doi: 10.1109/ICENIT57306.2022.00009.
[16] M. Bucos and B. Dragulescu, “Student cluster analysis based on Moodle data and academic performance indicators,” 2020 14th Int. Symp. Electron. Telecommun. ISETC 2020 - Conf. Proc., Nov. 2020, doi: 10.1109/ISETC50328.2020.9301061.
[17] L. Najdi and B. Er-Raha, “Implementing cluster analysis tool for the identification of students typologies,” Colloq. Inf. Sci. Technol. Cist, vol. 0, pp. 575–580, Jul. 2016, doi: 10.1109/CIST.2016.7804852.
[18] S. J. S. Alalawi, I. N. M. Shaharanee, and J. M. Jamil, “CLUSTERING STUDENT PERFORMANCE DATA USING k-MEANS ALGORITHMS,” J. Comput. Innov. Anal., vol. 2, no. 1, pp. 41–55, Jan. 2023, doi: 10.32890/JCIA2023.2.1.3.
[19] M. A. Job and J. Pandey, “Academic Performance Analysis Framework for Higher Education by Applying Data Mining Techniques,” ICRITO 2020 - IEEE 8th Int. Conf. Reliab. Infocom Technol. Optim. (Trends Futur. Dir., pp. 1145–1149, Jun. 2020, doi: 10.1109/ICRITO48877.2020.9197925.
[20] “Clusters of Success: Unpacking Academic Trends with K-Means Clustering in Education.” Accessed: Dec. 19, 2023. [Online]. Available: https://www.researchgate.net/publication/375635449_Clusters_of_Success_Unpacking_Academic_Trends_with_K-Means_Clustering_in_Education
[21] M. M. Rahman, Y. Watanobe, T. Matsumoto, R. U. Kiran, and K. Nakamura, “Educational Data Mining to Support Programming Learning Using Problem-Solving Data,” IEEE Access, vol. 10, pp. 26186–26202, 2022, doi: 10.1109/ACCESS.2022.3157288.
[22] N. Iam-On and T. Boongoen, “Generating descriptive model for student dropout: a review of clustering approach,” Human-centric Comput. Inf. Sci., vol. 7, no. 1, pp. 1–24, Dec. 2017, doi: 10.1186/S13673-016-0083-0/FIGURES/23.
[23] D. Hooshyar, Y. Yang, M. Pedaste, and Y. M. Huang, “Clustering Algorithms in an Educational Context: An Automatic Comparative Approach,” IEEE Access, vol. 8, pp. 146994–147014, 2020, doi: 10.1109/ACCESS.2020.3014948.
[24] A. Bessadok, E. Abouzinadah, and O. Rabie, “Exploring students digital activities and performances through their activities logged in learning management system using educational data mining approach,” Interact. Technol. Smart Educ., vol. 20, no. 1, pp. 58–72, Feb. 2023, doi: 10.1108/ITSE-08-2021-0148/FULL/XML.
[25] R. Gu, X. Jing, D. Zhao, L. Cai, and H. Gao, “Research and application of improved k-means algorithm based on educational big data,” Proc. - 2022 Int. Conf. Comput. Eng. Artif. Intell. ICCEAI 2022, pp. 746–749, 2022, doi: 10.1109/ICCEAI55464.2022.00157.
[26] R. Campagni, D. Merlini, R. Sprugnoli, and M. C. Verri, “Data mining models for student careers,” Expert Syst. Appl., vol. 42, no. 13, pp. 5508–5521, Aug. 2015, doi: 10.1016/J.ESWA.2015.02.052.
[27] B. Xu, M. Recker, X. Qi, N. Flann, and L. Ye, “Clustering Educational Digital Library Usage Data: A Comparison of Latent Class Analysis and K-Means Algorithms,” J. Educ. Data Min., vol. 5, no. 2, pp. 38–68, Jul. 2013, doi: 10.5281/ZENODO.3554633.
[28] R. Ordoñez-Avila, N. Salgado Reyes, J. Meza, and S. Ventura, “Data mining techniques for predicting teacher evaluation in higher education: A systematic literature review,” Heliyon, vol. 9, no. 3, p. e13939, Mar. 2023, doi: 10.1016/j.heliyon.2023.e13939.
[29] L. A. N. Muhammed, “Educational Data Mining: Analyzing Teacher Behavior based Student’s Performance,” Proc. - 2021 4th Int. Conf. Comput. Informatics Eng. IT-Based Digit. Ind. Innov. Welf. Soc. IC2IE 2021, pp. 181–185, 2021, doi: 10.1109/IC2IE53219.2021.9649366.
[30] M. Mayilvaganan and D. Kalpanadevi, “Cognitive Skill Analysis for Students through Problem Solving Based on Data Mining Techniques,” Procedia Comput. Sci., vol. 47, no. C, pp. 62–75, Jan. 2015, doi: 10.1016/J.PROCS.2015.03.184.
[31] S. Shrestha and M. Pokharel, “Machine Learning algorithm in educational data,” Int. Conf. Artif. Intell. Transform. Bus. Soc. AITB 2019, Nov. 2019, doi: 10.1109/AITB48515.2019.8947443.
[32] F. Alqasemi, S. Al-Hagree, A. Aqlan, K. M. A. Alalayah, Z. Almotwakl, and M. Hadwan, “Education Data Mining for Yemen Regions Based on Hierarchical Clustering Analysis,” 2021 Int. Conf. Technol. Sci. Adm. ICTSA 2021, Mar. 2021, doi: 10.1109/ICTSA52017.2021.9406544.
[33] A. M. De Morais, J. M. F. R. Araújo, and E. B. Costa, “Monitoring student performance using data clustering and predictive modelling,” Proc. - Front. Educ. Conf. FIE, vol. 2015-February, no. February, Feb. 2015, doi: 10.1109/FIE.2014.7044401.
[34] A. Abdulahi Hasan and H. Fang, “Data Mining in Education: Discussing Knowledge Discovery in Database (KDD) with Cluster Associative Study,” ACM Int. Conf. Proceeding Ser., May 2021, doi: 10.1145/3469213.3471319.
[35] P. Thakar, A. Mehta, and Manisha, “A unified model of clustering and classification to improve students’ employability prediction,” Int. J. Intell. Syst. Appl., vol. 9, no. 9, pp. 10–18, Sep. 2017, doi: 10.5815/IJISA.2017.09.02.
[36] M. Durairaj and C. Vijitha, “Educational Data mining for Prediction of Student Performance Using Clustering Algorithms,” Int. J. Comput. Sci. Inf. Technol., vol. 5, no. 4, pp. 5987–5991, 2014.
[37] L. Huang, “Teaching management data clustering analysis and implementation on ideological and political education of college students,” Proc. - 2016 Int. Conf. Smart Grid Electr. Autom. ICSGEA 2016, pp. 308–311, Nov. 2016, doi: 10.1109/ICSGEA.2016.61.
[38] R. K. Rambola, M. Inamke, and S. Harne, “Literature review- techniques and algorithms used for various applications of educational data mining (EDM).,” 2018 4th Int. Conf. Comput. Commun. Autom. ICCCA 2018, Dec. 2018, doi: 10.1109/CCAA.2018.8777556.
[39] P. Gu and Y. Zheng, “Cluster analysis on the teaching evaluation data from college students,” 13th Int. Conf. Comput. Sci. Educ. ICCSE 2018, pp. 839–844, Sep. 2018, doi: 10.1109/ICCSE.2018.8468864.
[40] A. Ktona, D. Xhaja, and I. Ninka, “Extracting relationships between students’ academic performance and their area of interest using data mining techniques,” Proc. - 6th Int. Conf. Comput. Intell. Commun. Syst. Networks, CICSyN 2014, pp. 6–11, Mar. 2014, doi: 10.1109/CICSYN.2014.18.
[41] V. Bahel, S. Malewar, and A. Thomas, “Student Interest Group Prediction using Clustering Analysis: An EDM approach,” Proc. 2nd IEEE Int. Conf. Comput. Intell. Knowl. Econ. ICCIKE 2021, pp. 481–484, Mar. 2021, doi: 10.1109/ICCIKE51210.2021.9410741.
[42] T. G. Ramos, J. C. F. Machado, and B. P. V. Cordeiro, “Primary Education Evaluation in Brazil Using Big Data and Cluster Analysis,” Procedia Comput. Sci., vol. 55, pp. 1031–1039, Jan. 2015, doi: 10.1016/J.PROCS.2015.07.061.
[43] A. Dutt, “Clustering Algorithms Applied in Educational Data Mining,” Int. J. Inf. Electron. Eng., 2015, doi: 10.7763/IJIEE.2015.V5.513.
[44] N. V. Krishna Rao, N. Mangathayaru, and M. Sreenivasa Rao, “Evolution and prediction of radical multi-dimensional e-learning system with cluster based data mining techniques,” Proc. - Int. Conf. Trends Electron. Informatics, ICEI 2017, vol. 2018-January, pp. 701–707, Jul. 2017, doi: 10.1109/ICOEI.2017.8300793.