Graph-Based Fingerprinting for Robust Video Sequence Identification under Temporal Reordering

Samra Naseer; Syeda Hafsa Ali

doi:10.33411/IJIST/1839

Authors

Ms. Samra Naseer Capital University of Science and Technology https://orcid.org/0009-0005-5859-8505
Syeda Hafsa Ali Cust University

DOI:

https://doi.org/10.33411/IJIST/1839

Keywords:

Video Fingerprinting, Graph-Based Representation, Near-Duplicate Detection, Temporal Robustness, Multimedia Retrieval.

Abstract

Video fingerprinting plays a vital role in near-duplicate detection, copyright protection, and multimedia retrieval. Most existing fingerprinting techniques rely on frame-level or sequential representations and assume that the temporal order of video content remains intact. However, real-world video reuse often involves temporal distortions such as segment reordering, frame dropping, and partial clip extraction, which significantly degrade the performance of sequence-dependent methods. This paper proposes a Graph-Based Video Fingerprinting Framework (GBVFF) for robust video identification under temporal distortions such as reordering, frame dropping, and segment shuffling. Unlike conventional sequence-dependent approaches, the proposed method models videos as graph structures to preserve contextual relationships independent of temporal order. The framework integrates Mean Canberra Distance for similarity estimation, and KL-divergence-based selection of representative fingerprint features. Experimental evaluation on benchmark datasets, including YouTube-8M and WebVid, demonstrates that GBVFF achieves a 12.4% improvement in accuracy, 14.8% higher precision, and a 27.6% reduction in false positive rate compared to state-of-the-art methods. The results validate that graph-based representations significantly enhance robustness against temporal perturbations, making the approach effective for real-world video retrieval, duplicate detection, and copyright protection systems.

Author Biography

Ms. Samra Naseer, Capital University of Science and Technology

Ms. Samra Naseer is an academic and researcher in the field of Computer Science. She completed her MPhil in Computer Science from Quaid-i-Azam University, Islamabad, in 2022. Currently, she is serving as an Associate Lecturer at Capital University of Science and Technology (CUST), where she is actively involved in teaching and supervising undergraduate research projects.

Her research primarily focuses on:

Multimedia Information Retrieval
Computer Vision
Artificial Intelligence

She is particularly interested in developing intelligent systems that can analyze, retrieve, and interpret multimedia data efficiently.

References

J. Yin, “Lightweight Neural Networks on Edge Devices for Real-Time Analysis of Student Movement in Cloud-Assisted Physical Education,” Internet Technol. Lett., vol. 9, no. 1, p. e70215, Jan. 2026, doi: 10.1002/ITL2.70215;CTYPE:STRING:JOURNAL.

“(PDF) Video Copy Detection Using Spatio-Temporal CNN Features.” Accessed: May 01, 2026. [Online]. Available: https://www.researchgate.net/publication/334616993_Video_Copy_Detection_Using_Spatio-Temporal_CNN_Features

Xiaoqian Shen, Wenxuan Zhang, Jun Chen, Mohamed Elhoseiny, “Vgent: Graph-based Retrieval-Reasoning-Augmented Generation For Long Video Understanding,” arXiv:2510.14032, 2025, [Online]. Available: https://arxiv.org/abs/2510.14032

U. Rashid, “Sampling Fingerprints From Multimedia Content Resource Clusters,” IEEE Access, vol. 11, pp. 141640–141656, 2023, doi: 10.1109/ACCESS.2023.3343190.

Lingmin Pan, Ziyi Gao, “ETR: Event-Centric Temporal Reasoning for Question-Conditioned Video Question Answering,” Mathematics, vol. 14, no. 5, p. 913, 2026, doi: https://doi.org/10.3390/math14050913.

Mohamed Allouche, Mihai Mitrea, “Video fingerprinting: Past, present, and future,” Front. Signal Process., vol. 2, 2022, doi: https://doi.org/10.3389/frsip.2022.984169.

X. Zhang, J. Wang, Q. Wang, S. Liu, J. Xie, and Y. Luo, “HST-former: hierarchical spatio-temporal aggregation for video-based animal re-identification,” Sci. Reports 2026, Apr. 2026, doi: 10.1038/s41598-026-46774-6.

Z. Zhang, X. Mao, J. Zhang, W. Lian, S. Xu, and X. Zhang, “Joint Semantic Graph and Visual Image Retrieval Guided Video Copy Detection,” ACM Int. Conf. Proceeding Ser., pp. 76–84, Dec. 2023, doi: 10.1145/3638884.3638896.

Khalil Bachiri, Ali Yahyaouy, Maria Malek & Nicoleta Rogovschi, “MM-HGNN: Multimodal Representation Learning Heterogeneous Graph Neural Network,” Int. J. Comput. Intell. Syst., vol. 18, no. 178, 2025, [Online]. Available: https://link.springer.com/article/10.1007/s44196-025-00820-9

I. Amerini, A. Anagnostopoulos, L. Maiano, and L. R. Celsi, “Learning double-compression video fingerprints left from social-media platforms,” ICASSP, IEEE Int. Conf. Acoust. Speech Signal Process. - Proc., vol. 2021-June, pp. 2530–2534, 2021, doi: 10.1109/ICASSP39728.2021.9413366.

“GitHub - m-bain/webvid: Large-scale text-video dataset. 10 million captioned short videos. · GitHub.” Accessed: Apr. 02, 2026. [Online]. Available: https://github.com/m-bain/webvid

“A robust and lightweight feature system for video fingerprinting | IEEE Conference Publication | IEEE Xplore.” Accessed: Apr. 02, 2026. [Online]. Available: https://ieeexplore.ieee.org/document/6334223

H. Jégou, M. Douze, and C. Schmid, “Product quantization for nearest neighbor search,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 1, pp. 117–128, 2011, doi: 10.1109/TPAMI.2010.57.

L. Ding, Q. Fan, J.-H. Hsiao, and S. Pankanti, “Graph based event detection from realistic videos using weak feature correspondence,” pp. 1262–1265, Oct. 2010, doi: 10.1109/ICASSP.2010.5495411.

Katarzyna Fojcik, Piotr Syga, “Extremely compact video representation for efficient near-duplicates detection,” Pattern Recognit., vol. 158, 2025, [Online]. Available: https://www.sciencedirect.com/science/article/abs/pii/S0031320324007672

A. S. Traore, “Reverse Video Search Engine Using Audio Fingerprint & Convolutional Neural Networks,” Proc. - 2024 OITS Int. Conf. Inf. Technol. OCIT 2024, pp. 629–634, 2024, doi: 10.1109/OCIT65031.2024.00115.

X. Zhang, Y. Xie, X. Luan, J. He, L. Zhang, and L. Wu, “Video Copy Detection Based on Deep CNN Features and Graph-Based Sequence Matching,” Wirel. Pers. Commun. 2018 1031, vol. 103, no. 1, pp. 401–416, Mar. 2018, doi: 10.1007/S11277-018-5450-X.

Qian Li, Lixin Su, “Text-Video Retrieval via Multi-Modal Hypergraph Networks,” WSDM 2024 - Proc. 17th ACM Int. Conf. Web Search Data Min., 2024, [Online]. Available: https://dl.acm.org/doi/10.1145/3616855.3635757

S. Zhang, J. Zhang, Y. Wang, and L. Zhuo, “Short video fingerprint extraction: from audio–visual fingerprint fusion to multi-index hashing,” Multimed. Syst. 2022 293, vol. 29, no. 3, pp. 981–1000, Dec. 2022, doi: 10.1007/S00530-022-01031-4.

Wendi Chen, Wensheng Gan, Philip S. Yu, “Digital Fingerprinting on Multimedia: A Survey,” arXiv:2408.14155, 2024, [Online]. Available: https://arxiv.org/abs/2408.14155

“YouTube-8M: A Large and Diverse Labeled Video Dataset for Video Understanding Research.” Accessed: Apr. 02, 2026. [Online]. Available: https://research.google.com/youtube8m/