Deep Learning-Based Automated Classroom Slide Extraction

Zeeshan Azhar; Hassan Chaudhry; Farzana Kulsoom; Sanam Narejo

Authors

Zeeshan Azhar Computer Science Department, Barani Institute of Technology, Rawalpindi, Pakistan
Hassan Chaudhry Computer Science Department, Barani Institute of Technology, Rawalpindi, Pakistan
Farzana Kulsoom Telecommunication Engineering Department University of Engineering and Technology Taxila, Pakistan.
Sanam Narejo Department of Computer Systems Engineering. Mehran University of Engineering and Technology, Jamshoro.

Keywords:

Deep Learning, Computer vision, Academic assistant system, YOLO, Object detection, CNN

Abstract

Automated extraction of valuable content from real-time classroom lectures holds significant potential for enhancing educational accessibility and efficiency. However, capturing the spontaneous insights of live lectures often proves challenging due to rapid visual transitions, instructor movement, and diverse learning styles. This paper presents a novel approach that combines the strengths of YOLO and Scale-Invariant Feature Transform (SIFT) techniques to automatically extract slides from live classroom lectures. YOLO, a real-time object detection algorithm, is employed to identify board area, teacher, and other objects within the video stream. While SIFT, a robust feature-based method, was used to accurately merge key points from multiple pictures of the same region. The proposed method involves a multi-stage process: first, YOLO detects the potential place of the teacher, which occluded the board within the video frames. Subsequently, the teacher was removed from the image. The board was divided into multiple segments, to remove and merge redundant content Scale-invariant feature Transform (SIFT) was employed. Experimental results on a diverse dataset of classroom lecture videos demonstrated the effectiveness of the proposed method in extracting slides across different environments, lecture styles, and recording conditions. The potential benefits include improved note-taking, reduced manual effort in content curation, and enhanced accessibility to lecture materials. The presented approach contributes to the broader goal of leveraging computer vision and machine learning techniques to transform traditional classroom settings into modern, interactive, and adaptive learning environments.

References

B. S. Prakash, K. V. Sanjeev, R. Prakash, K. Chandrasekaran, M. V. Rathnamma, and V. V. Ramana, “Review of techniques for automatic text summarization,” Adv. Intell. Syst. Comput., vol. 1090, pp. 557–565, 2020, doi: 10.1007/978-981-15-1480-7_47/COVER.

M. U. Uçar and E. Özdemir, “Recognizing Students and Detecting Student Engagement with Real-Time Image Processing,” Electron. 2022, Vol. 11, Page 1500, vol. 11, no. 9, p. 1500, May 2022, doi: 10.3390/ELECTRONICS11091500.

Z. Chen, M. Liang, Z. Xue, and W. Yu, “STRAN: Student expression recognition based on spatio-temporal residual attention network in classroom teaching videos,” Appl. Intell., vol. 53, no. 21, pp. 25310–25329, Nov. 2023, doi: 10.1007/S10489-023-04858-0/METRICS.

G. Lu, “What Factors Influence Students’ Seat Preference in the Smart Classroom? Results of Logistic Regression,” 2023 11th Int. Conf. Inf. Educ. Technol. ICIET 2023, pp. 426–432, 2023, doi: 10.1109/ICIET56899.2023.10111234.

M. Maijala and M. Mutta, “The Teacher’s Role in Robot-assisted Language Learning and its Impact on Classroom Ecology,” EuroCALL Rev., vol. 30, no. 2, pp. 6–23, Jan. 2023, doi: 10.4995/EUROCALL.2023.17018.

F. Zulfiqar, R. Raza, M. O. Khan, M. Arif, A. Alvi, and T. Alam, “Augmented Reality and its Applications in Education: A Systematic Survey,” IEEE Access, vol. 11, pp. 143250–143271, 2023, doi: 10.1109/ACCESS.2023.3331218.

D. Singh S, A. Gupta, C. V. Jawahar, and M. Tapaswi, “Unsupervised Audio-Visual Lecture Segmentation,” Proc. - 2023 IEEE Winter Conf. Appl. Comput. Vision, WACV 2023, pp. 5221–5230, 2023, doi: 10.1109/WACV56688.2023.00520.

S. Kaddoura, S. Vincent, and D. J. Hemanth, “Computational Intelligence and Soft Computing Paradigm for Cheating Detection in Online Examinations,” Appl. Comput. Intell. Soft Comput., vol. 2023, 2023, doi: 10.1155/2023/3739975.

E. Dimitriadou and A. Lanitis, “A critical evaluation, challenges, and future perspectives of using artificial intelligence and emerging technologies in smart classrooms,” Smart Learn. Environ., vol. 10, no. 1, pp. 1–26, Dec. 2023, doi: 10.1186/S40561-023-00231-3/TABLES/7.

C. B. Murthy, M. F. Hashmi, N. D. Bokde, and Z. W. Geem, “Investigations of Object Detection in Images/Videos Using Various Deep Learning Techniques and Embedded Platforms—A Comprehensive Review,” Appl. Sci. 2020, Vol. 10, Page 3280, vol. 10, no. 9, p. 3280, May 2020, doi: 10.3390/APP10093280.

J. Prather et al., “The Robots are Here: Navigating the Generative AI Revolution in Computing Education,” ITiCSE-WGR 2023 - Proc. 2023 Work. Gr. Reports Innov. Technol. Comput. Sci. Educ., pp. 108–159, Dec. 2023, doi: 10.1145/3623762.3633499.

S. K. Jagatheesaperumal, K. Ahmad, A. Al-Fuqaha, and J. Qadir, “Advancing Education Through Extended Reality and Internet of Everything Enabled Metaverses: Applications, Challenges, and Open Issues,” IEEE Trans. Learn. Technol., vol. 17, pp. 1120–1139, 2024, doi: 10.1109/TLT.2024.3358859.

C. Thomas, K. A. V. Puneeth Sarma, S. Swaroop Gajula, and D. B. Jayagopi, “Automatic prediction of presentation style and student engagement from videos,” Comput. Educ. Artif. Intell., vol. 3, p. 100079, Jan. 2022, doi: 10.1016/J.CAEAI.2022.100079.

H. Hassani, M. J. Ershadi, and A. Mohebi, “LVTIA: A new method for keyphrase extraction from scientific video lectures,” Inf. Process. Manag., vol. 59, no. 2, p. 102802, Mar. 2022, doi: 10.1016/J.IPM.2021.102802.

F. Kulsoom, S. Narejo, Z. Mehmood, H. N. Chaudhry, A. Butt, and A. K. Bashir, “A review of machine learning-based human activity recognition for diverse applications,” Neural Comput. Appl. 2022 3421, vol. 34, no. 21, pp. 18289–18324, Aug. 2022, doi: 10.1007/S00521-022-07665-9.

T. Kalsum et al., “Localization and classification of human facial emotions using local intensity order pattern and shape-based texture features,” J. Intell. Fuzzy Syst., vol. 40, no. 5, pp. 9311–9331, Jan. 2021, doi: 10.3233/JIFS-201799.

C. M. Badgujar, A. Poulose, and H. Gan, “Agricultural Object Detection with You Look Only Once (YOLO) Algorithm: A Bibliometric and Systematic Literature Review,” Jan. 2024, Accessed: Apr. 14, 2024. [Online]. Available: https://arxiv.org/abs/2401.10379v1

P. Athira, T. P. Mithun Haridas, and M. H. Supriya, “Underwater Object Detection model based on YOLOv3 architecture using Deep Neural Networks,” 2021 7th Int. Conf. Adv. Comput. Commun. Syst. ICACCS 2021, pp. 40–45, Mar. 2021, doi: 10.1109/ICACCS51430.2021.9441905.

J. Lin, Y. Zhao, S. Wang, and Y. Tang, “YOLO-DA: An Efficient YOLO-Based Detector for Remote Sensing Object Detection,” IEEE Geosci. Remote Sens. Lett., vol. 20, 2023, doi: 10.1109/LGRS.2023.3303896.

S. Kim, J. G. Lee, and M. Y. Yi, “Developing information quality assessment framework of presentation slides,” https://doi.org/10.1177/0165551516661917, vol. 43, no. 6, pp. 742–768, Sep. 2016, doi: 10.1177/0165551516661917.

I. Benedetto, M. La Quatra, L. Cagliero, L. Canale, and L. Farinetti, “Abstractive video lecture summarization: applications and future prospects,” Educ. Inf. Technol., vol. 29, no. 3, pp. 2951–2971, Feb. 2024, doi: 10.1007/S10639-023-11855-W/METRICS.

“Automated analysis and indexing of lecture videos .” Accessed: Apr. 14, 2024. [Online]. Available: https://dr.lib.iastate.edu/server/api/core/bitstreams/d970b9dd-4b3a-4ca8-b75b-645aa693413e/content

T. Seng, “Enriching Existing Educational Video Datasets to Improve Slide Classification and Analysis,” MM 2022 - Proc. 30th ACM Int. Conf. Multimed., pp. 6930–6934, Oct. 2022, doi: 10.1145/3503161.3548758.

C. V. Loc, N. T. Nhan, T. X. Viet, T. H. Viet, L. H. Thao, and N. H. Viet, “Content based Lecture Video Retrieval using Textual Queries: to be Smart University,” Proc. - Int. Conf. Knowl. Syst. Eng. KSE, vol. 2021-November, 2021, doi: 10.1109/KSE53942.2021.9648820.

B. Teuscher, Z. Xiong, and M. Werner, “An Augmentation Framework for Efficiently Extracting Open Educational Resources from Slideshows,” Ninth Int. Conf. High. Educ. Adv., May 2023, Accessed: Apr. 14, 2024. [Online]. Available: http://ocs.editorial.upv.es/index.php/HEAD/HEAd23/paper/view/16169

S. Cumani, P. Laface, and F. Kulsoom, “Speaker recognition by means of acoustic and phonetically informed GMMs,” Proc. Annu. Conf. Int. Speech Commun. Assoc. INTERSPEECH, vol. 2015-January, pp. 200–204, 2015, doi: 10.21437/INTERSPEECH.2015-84.