A Survey on Audio Deepfake Detection: Techniques, Datasets, and Key Challenges

Authors

  • imran javed CUST
  • aamer nadeem CUST

Keywords:

Audio Deepfake Detection, Machine Learning, Deep Learning, Spoofing Attacks, Speech Synthesis, Anti-Spoofing

Abstract

Audio deepfake detection has become a critical research area in response to the rapid proliferation of deep learning-based speech synthesis and voice conversion technologies. This survey systematically reviews recent advances (2019–2025) in audio deepfake detection, covering attack typologies, detection methodologies, benchmark datasets, and open challenges. A total of 40 peer-reviewed studies were analyzed using a structured inclusion/exclusion protocol based on searches of IEEE Xplore, ACM Digital Library, Google Scholar, and ScienceDirect. The reviewed detection systems report accuracy ranges of 87–98.5% across benchmark datasets, with Equal Error Rates (EER) ranging from approximately 1.8% to 8.3%, and tandem detection cost function (t-DCF) values between 0.041 and 0.212 on ASVspoof 2019. Deep learning approaches, particularly residual networks (ResNet), squeeze-excitation networks (SENet), and hybrid convolutional-recurrent architectures, consistently outperform classical machine learning methods (SVM, GMM) by 5–12% in accuracy under matched conditions. However, cross-dataset and cross-language generalization remain critical unresolved challenges. This survey identifies self-supervised learning and imitation-based detection as high-priority future research directions.

References

“How Fraudsters Used AI To Mimic CEO’s Voice To Steal £220,000!” Accessed: Mar. 17, 2026. [Online]. Available: https://www.think-cloud.co.uk/blog/how-cybercriminals-used-ai-to-mimic-ceo-s-voice-to-steal-£220-000/

Zaynab Almutairi, Hebah Elgibreen, “A Review of Modern Audio Deepfake Detection Methods: Challenges and Future Directions,” Algorithms, vol. 15, no. 5, p. 155, 2022, doi: https://doi.org/10.3390/a15050155.

Mouna Rabhi, Spiridon Bakiras, “Audio-deepfake detection: Adversarial attacks and countermeasures,” Expert Syst. Appl., vol. 250, p. 123941, 2024, doi: https://doi.org/10.1016/j.eswa.2024.123941.

Xu Tan, Tao Qin, Frank Soong, Tie-Yan Liu, “A Survey on Neural Speech Synthesis,” arXiv:2106.15561, 2021, [Online]. Available: https://arxiv.org/abs/2106.15561

Naroa Amezaga, Jeremy Hajek, “Availability of Voice Deepfake Technology and its Impact for Good and Evil,” SIGITE 2022 - Proc. 23rd Annu. Conf. Inf. Technol. Educ., 2022, [Online]. Available: https://dl.acm.org/doi/10.1145/3537674.3554742

M. Singh and D. Pati, “Countermeasures to Replay Attacks: A Review,” IETE Tech. Rev. (Institution Electron. Telecommun. Eng. India), vol. 37, no. 6, pp. 599–614, 2020, doi: 10.1080/02564602.2019.1684851.

Chengzhe Sun, Shan Jia, Shuwei Hou, Siwei Lyu, “AI-Synthesized Voice Detection Using Neural Vocoder Artifacts,” arXiv:2304.13085, 2023, [Online]. Available: https://arxiv.org/abs/2304.13085

R. Bohara and A. K. Bairwa, “Detecting Deepfake Audio Using Spectrogram-Based Machine Learning Approaches,” IEEE Access, vol. 13, pp. 149478–149489, 2025, doi: 10.1109/ACCESS.2025.3602531.

Anton Firc, Kamil Malinka, “Deepfake Speech Detection: A Spectrogram Analysis,” Proc. ACM Symp. Appl. Comput., 2024, [Online]. Available: https://dl.acm.org/doi/10.1145/3605098.3635911

N. Chakravarty and M. Dua, “Erlang Spectrogram and Residual Network-Based Features for Fake Audio Detection,” IETE J. Res., vol. 71, no. 4, pp. 1134–1140, Apr. 2025, doi: 10.1080/03772063.2025.2453882.

Dora M. Ballesteros, Yohanna Rodriguez-Ortega, “Deep4SNet: deep learning for fake speech classification,” Expert Syst. Appl., vol. 184, p. 115465, 2021, doi: https://doi.org/10.1016/j.eswa.2021.115465.

Mohammed Lataifeh, Ashraf Elnagar, “Arabic audio clips: Identification and discrimination of authentic Cantillations from imitations,” Neurocomputing, vol. 418, pp. 162–177, 2020, doi: https://doi.org/10.1016/j.neucom.2020.07.099.

A. K. Singh and P. Singh, “Detection of AI-Synthesized Speech Using Cepstral & Bispectral Statistics,” Proc. - 4th Int. Conf. Multimed. Inf. Process. Retrieval, MIPR 2021, pp. 412–417, 2021, doi: 10.1109/MIPR51284.2021.00076.

Tianyun Liu, Diqun Yan, “Identification of Fake Stereo Audio Using SVM and CNN,” Information, vol. 12, no. 7, p. 263, 2021, doi: https://doi.org/10.3390/info12070263.

Nishant Subramani, Delip Rao, “Learning Efficient Representations for Fake Speech Detection,” Proc. AAAI Conf. Artif. Intell., vol. 34, no. 4, 2020, [Online]. Available: https://ojs.aaai.org/index.php/AAAI/article/view/6044

Emily R. Bartusiak, Edward J. Delp, “Frequency Domain-Based Detection of Generated Audio,” arXiv:2205.01806, 2022, [Online]. Available: https://arxiv.org/abs/2205.01806

Zhenchun Lei, Yingen Yang, Changhong Liu, Jihua Ye, “Siamese Convolutional Neural Network Using Gaussian Probability Feature for Spoofing Speech Detection,” Proc. Annu. Conf. Int. Speech Commun. Assoc. INTERSPEECH, 2020, [Online]. Available: https://www.isca-archive.org/interspeech_2020/lei20_interspeech.html

H. Yu, Z. H. Tan, Z. Ma, R. Martin, and J. Guo, “Spoofing Detection in Automatic Speaker Verification Systems Using DNN Classifiers and Dynamic Acoustic Features,” IEEE Trans. neural networks Learn. Syst., vol. 29, no. 10, pp. 4633–4644, Oct. 2018, doi: 10.1109/TNNLS.2017.2771947.

Run Wang, Felix Juefei-Xu, Yihao Huang, Qing Guo, Xiaofei Xie, Lei Ma, Yang Liu, “DeepSonar: Towards Effective and Robust Detection of AI-Synthesized Fake Voices,” arXiv:2005.13770, 2020, [Online]. Available: https://arxiv.org/abs/2005.13770

A. Chintha et al., “Recurrent Convolutional Structures for Audio Spoof and Video Deepfake Detection,” IEEE J. Sel. Top. Signal Process., vol. 14, no. 5, pp. 1024–1037, Aug. 2020, doi: 10.1109/JSTSP.2020.2999185.

T. Arif, A. Javed, M. Alhameed, F. Jeribi and A. Tahir, “Voice Spoofing Countermeasure for Logical Access Attacks Detection,” IEEE Access, vol. 9, pp. 162857–162868, 2021, doi: 10.1109/ACCESS.2021.3133134.

Cheng-I Lai, Nanxin Chen, Jesús Villalba, Najim Dehak, “ASSERT: Anti-Spoofing with Squeeze-Excitation and Residual neTworks,” arXiv:1904.01120, 2019, [Online]. Available: https://arxiv.org/abs/1904.01120

Ziyue Jiang, Hongcheng Zhu, “Self-Supervised Spoofing Audio Detection Scheme,” Proc. Annu. Conf. Int. Speech Commun. Assoc. INTERSPEECH, 2020, [Online]. Available: https://www.isca-archive.org/interspeech_2020/jiang20b_interspeech.html

Z. Wu et al., “ASVspoof: The automatic speaker verification spoofing and countermeasures challenge,” IEEE J. Sel. Top. Signal Process., vol. 11, no. 4, pp. 588–604, Jun. 2017, doi: 10.1109/JSTSP.2017.2671435.

Massimiliano Todisco, Xin Wang, Ville Vestman, Md Sahidullah, Hector Delgado, “ASVspoof 2019: Future Horizons in Spoofed and Fake Audio Detection,” arXiv:1904.05441, 2019, [Online]. Available: https://arxiv.org/abs/1904.05441

“| ASVspoof.” Accessed: Apr. 29, 2026. [Online]. Available: https://www.asvspoof.org/index2021.html

Dora M.L. Ballesteros, Juan M.A. Moreno, “A dataset of histograms of original and fake voice recordings (H-Voice),” Data Br., vol. 29, p. 105331, 2020, doi: https://doi.org/10.1016/j.dib.2020.105331.

Hasam Khalid, Shahroz Tariq, Minha Kim, Simon S. Woo, “FakeAVCeleb: A Novel Audio-Video Multimodal Deepfake Dataset,” arXiv:2108.05080, 2021, [Online]. Available: https://arxiv.org/abs/2108.05080

“The M-AILABS Speech Dataset – Community Infrastructure to Strengthen AI for Audio Deepfake analysis (CISAAD) – UMBC.” Accessed: Apr. 29, 2026. [Online]. Available: https://cisaad.umbc.edu/the-m-ailabs-speech-dataset/

R. Reimao and V. Tzerpos, “FoR: A dataset for synthetic speech detection,” 2019 10th Int. Conf. Speech Technol. Human-Computer Dialogue, SpeD 2019, Oct. 2019, doi: 10.1109/SPED.2019.8906599.

Mohammed Lataifeh, Ashraf Elnagar, “Ar-DAD: Arabic diversified audio dataset,” Data Br., vol. 33, p. 106503, 2020, doi: https://doi.org/10.1016/j.dib.2020.106503.

“CSaLT - PRUS.” Accessed: Mar. 17, 2026. [Online]. Available: https://www.c-salt.org/downloads/prus

“CSALT/deepfake_detection_dataset_urdu · Datasets at Hugging Face.” Accessed: Mar. 17, 2026. [Online]. Available: https://huggingface.co/datasets/CSALT/deepfake_detection_dataset_urdu

Downloads

Published

2026-05-17

How to Cite

javed, imran, & nadeem, aamer. (2026). A Survey on Audio Deepfake Detection: Techniques, Datasets, and Key Challenges. International Journal of Innovations in Science & Technology, 8(3), 574–582. Retrieved from https://journal.50sea.com/index.php/IJIST/article/view/1808