Classifying Wildlife Acoustic Signals using a Deep Learning Approach

Sunila Sheikh; Umer Rashid

Authors

Sunila Sheikh Quaid-i-Azam University
Umer Rashid Quaid-i-Azam University

Keywords:

CNN Bi-GRU Architecture, Context-Aware Sound Classification, Fuzzy Logic Decision Layer, Hybrid Deep Learning Model, Environmental Acoustic Monitoring

Abstract

Monitoring wildlife and environmental conditions in national parks is essential for ecological research, biodiversity conservation, and public safety. This study proposes a contextual sound-based monitoring framework that addresses the limitations of vision-based systems in low-light and occluded environments commonly found in wildlife areas. The proposed approach integrates a hybrid deep learning architecture combining Convolutional Neural Networks (CNNs) for spatial feature extraction and a Bidirectional Gated Recurrent Unit (BiGRU) for temporal sequence modeling, along with a fuzzy logic decision layer for high-level contextual interpretation. To ensure diversity and robustness, multiple open-source datasets, including ESC-50, UrbanSound8K, FSC22, and Scream/Non-Scream datasets, are preprocessed, harmonized, and merged into a unified dataset comprising 15,811 audio clips across 16 low-level sound classes. The dataset includes alarming sounds, representing complex acoustic environments relevant to wildlife and park monitoring. The model employs a hierarchical classification strategy. Firstly, the CNN-BiGRU network performs low-level sound event classification, and then a fuzzy inference system maps the outputs into four high-level contextual categories: Illegal Activity, Human Distress, Natural Hazard, and Safe Activity. Experimental results demonstrate strong performance, achieving an accuracy of 95.80%, precision of 95.95%, recall of 96.14%, weighted F1-score of 95.80%, and ROC-AUC of 99.67% on the UrbanSound8K dataset. With an accuracy of 91.30%, precision of 88.11%, recall of 86.23%, weighted F1-score of 87.91%, and ROC-AUC of 99.50%, the model maintains competitive performance on the harmonized dataset with a difference of less than 5% across evaluation metrics. These findings highlight the effectiveness of contextual sound analysis in enhancing situational awareness and supporting intelligent surveillance systems for wildlife and environmental monitoring.

Author Biography

Umer Rashid, Quaid-i-Azam University

Associate Professor, Department of Computer Science, QAU

References

Olusola O. Abayomi-Alli, Robertas Damaševičius, “Data Augmentation and Deep Learning Methods in Sound Classification: A Systematic Review,” Electronics, vol. 11, no. 22, p. 3795, 2022, doi: https://doi.org/10.3390/electronics11223795.

S. L. Ullo, S. K. Khare, V. Bajaj and G. R. Sinha, “Hybrid Computerized Method for Environmental Sound Classification,” IEEE Access, vol. 8, pp. 124055–124065, 2020, doi: 10.1109/ACCESS.2020.3006082.

Mahendra Kumar Gourisaria, Rakshit Agrawal, Manoj Sahni & Pradeep Kumar Singh, “Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques,” Discov. Internet Things, vol. 4, no. 1, 2024, [Online]. Available: https://link.springer.com/article/10.1007/s43926-023-00049-y

Zeinel Momynkulov, Zhandos Dosbayev, “Fast Detection and Classification of Dangerous Urban Sounds Using Deep Learning,” Comput. Mater. Contin., vol. 75, no. 1, pp. 2191–2208, 2023, doi: https://doi.org/10.32604/cmc.2023.036205.

M. Bubashait and N. Hewahi, “Urban Sound Classification Using DNN, CNN LSTM a Comparative Approach,” 2021 Int. Conf. Innov. Intell. Informatics, Comput. Technol. 3ICT 2021, pp. 46–50, Sep. 2021, doi: 10.1109/3ICT53449.2021.9581339.

Jozef Kotus, Kuba Lopatka, “Detection and localization of selected acoustic events in acoustic field for smart surveillance applications,” Multimed. Tools Appl., vol. 68, pp. 5–21, 2024, [Online]. Available: https://link.springer.com/article/10.1007/s11042-012-1183-0

Wuyue Xiong, Xuenan Xu, “Sound-Based Construction Activity Monitoring with Deep Learning,” Buildings, vol. 12, no. 11, p. 1947, 2022, doi: https://doi.org/10.3390/buildings12111947.

“(PDF) Application of CNN Models to Detect and Classify Leakages in Water Pipelines Using Magnitude Spectra of Vibration Sound.” Accessed: May 05, 2026. [Online]. Available: https://www.researchgate.net/publication/368774766_Application_of_CNN_Models_to_Detect_and_Classify_Leakages_in_Water_Pipelines_Using_Magnitude_Spectra_of_Vibration_Sound

Garima Sharma, Kartikeyan Umapathy, “Trends in audio signal feature extraction methods,” Appl. Acoust., vol. 158, p. 107020, 2020, doi: https://doi.org/10.1016/j.apacoust.2019.107020.

K. Zaman, M. Sah, C. Direkoglu and M. Unoki, “A Survey of Audio Classification using Deep Learning,” IEEE Access, vol. 11, pp. 106620–106649, 2023, doi: 10.1109/ACCESS.2023.3318015.

D. Vij, Y. Yogesh, D. Srivastava, and H. Shankar, “Detection of Acoustic Scenes and Events using Audio Analysis - A Survey,” 2023 3rd Int. Conf. Adv. Comput. Innov. Technol. Eng. ICACITE 2023, pp. 316–320, 2023, doi: 10.1109/ICACITE57410.2023.10183195.

Sahan Dissanayaka, Manjusri Wickramasinghe, Pasindu Marasinghe, “Temporal Convolution-based Hybrid Model Approach with Representation Learning for Real-Time Acoustic Anomaly Detection,” arXiv:2410.19722, 2024, [Online]. Available: https://arxiv.org/abs/2410.19722

Zohaib Mushtaq, Shun Feng Su, “Spectral images based environmental sound classification using CNN with meaningful data augmentation,” Appl. Acoust., vol. 172, p. 107581, 2021, doi: https://doi.org/10.1016/j.apacoust.2020.107581.

L. Luo, “A System for the Detection of Polyphonic Sound on a University Campus Based on CapsNet-RNN,” IEEE Access, vol. 9, pp. 147900–147913, 2021, doi: 10.1109/ACCESS.2021.3123970.

Jia Wei Chang, Hao Shang Ma, “Multi-Level Transfer Learning using Incremental Granularities for environmental sound classification and detection,” Appl. Soft Comput., vol. 169, p. 112619, 2025, doi: https://doi.org/10.1016/j.asoc.2024.112619.

H. M. Do, K. C. Welch and W. Sheng, “SoHAM: A Sound-Based Human Activity Monitoring Framework for Home Service Robots,” IEEE Trans. Autom. Sci. Eng., vol. 19, no. 3, pp. 2369–2383, 2022, doi: 10.1109/TASE.2021.3081406.

S. Sathruhan, O. K. Herath, T. Sivakumar, and A. Thibbotuwawa, “Emergency Vehicle Detection using Vehicle Sound Classification: A Deep Learning Approach,” 6th SLAAI - Int. Conf. Artif. Intell. SLAAI-ICAI-2022, 2022, doi: 10.1109/SLAAI-ICAI56923.2022.10002605.

A. Bansal and N. K. Garg, “Robust technique for environmental sound classification using convolutional recurrent neural network,” Multimed. Tools Appl. 2023 8318, vol. 83, no. 18, pp. 54755–54772, Dec. 2023, doi: 10.1007/s11042-023-17066-2.

Seung Ju Lim, Seong Jin Jang, “Classification of snoring sound based on a recurrent neural network,” Expert Syst. Appl., vol. 123, pp. 237–245, 2019, doi: https://doi.org/10.1016/j.eswa.2019.01.020.

Huy Phan, Philipp Koch, Fabrice Katzberg, Marco Maass, Radoslaw Mazur, Alfred Mertins, “Audio Scene Classification with Deep Recurrent Neural Networks,” arXiv:1703.04770, 2017, [Online]. Available: https://arxiv.org/abs/1703.04770

“Urban Sound Classification using Long Short-Term Memory Neural Network | IEEE Conference Publication | IEEE Xplore.” Accessed: May 05, 2026. [Online]. Available: https://ieeexplore.ieee.org/document/8859780

Liane Marina Meßmer, Christoph Reich, “Context-aware acoustic signal processing,” Procedia Comput. Sci., vol. 225, pp. 1073–1082, 2023, doi: https://doi.org/10.1016/j.procs.2023.10.095.

Giuseppe De Simone, Antonio Greco, Francesco Rosa, Alessia Saggese & Mario Vento, “Context-aware data augmentation for enhanced speech command recognition in industrial environments,” Sci. Rep., 2025, [Online]. Available: https://www.nature.com/articles/s41598-025-01886-3

“Fuzzy Logic in Surveillance Big Video Data Analysis: Comprehensive Review, Challenges, and Research Directions | Request PDF.” Accessed: May 05, 2026. [Online]. Available: https://www.researchgate.net/publication/351792725_Fuzzy_Logic_in_Surveillance_Big_Video_Data_Analysis_Comprehensive_Review_Challenges_and_Research_Directions

Reza Saatchi, “Fuzzy Logic Concepts, Developments and Implementation,” Information, vol. 15, no. 10, p. 656, 2024, [Online]. Available: https://shura.shu.ac.uk/34360/

S. Wu, R. Li, Y. Song, S. Qin, Q. Wen, and F. Gao, “Quantum-Assisted Hierarchical Fuzzy Neural Network for Image Classification,” IEEE Trans. Fuzzy Syst., vol. 33, no. 1, pp. 491–502, 2025, doi: 10.1109/TFUZZ.2024.3435792.

Karol J. Piczak, “ESC: Dataset for Environmental Sound Classification,” MM 2015 - Proc. 2015 ACM Multimed. Conf., vol. 10, 2015, [Online]. Available: https://dl.acm.org/doi/10.1145/2733373.2806390

Justin Salamon, Christopher Jacoby, “A Dataset and Taxonomy for Urban Sound Research,” MM 2014 - Proc. 2014 ACM Conf. Multimed., vol. 11, 2014, [Online]. Available: https://dl.acm.org/doi/10.1145/2647868.2655045

Meelan Bandara, Roshinie Jayasundara, “Forest Sound Classification Dataset: FSC22,” Sensors, vol. 23, no. 4, p. 2032, 2023, doi: https://doi.org/10.3390/s23042032.

“Audio Dataset of Scream and Non Scream.” Accessed: Mar. 11, 2026. [Online]. Available: https://www.kaggle.com/datasets/aananehsansiam/audio-dataset-of-scream-and-non-scream

Jinhua Liang, Ines Nolasco, Burooj Ghani, Huy Phan, Emmanouil Benetos, Dan Stowell, “Mind the Domain Gap: a Systematic Analysis on Bioacoustic Sound Event Detection,” Eur. Signal Process. Conf., 2024, [Online]. Available: https://arxiv.org/abs/2403.18638

Shilpa Gupta, Varun Srivastava, “Environment Sound Classification using stacked features and convolutional neural network,” ACM Int. Conf. Proceeding Ser., 2024, [Online]. Available: https://dl.acm.org/doi/10.1145/3675888.3676028

I. Mohino-Herranz, J. García-Gómez, “Implementing transfer learning for sound event classification using the realised audio database,” Meas. Sensors, vol. 38, p. 101711, 2025, doi: https://doi.org/10.1016/j.measen.2024.101711.

Feilong Chen, Zhenjun Zhu, “Evaluating metric and contrastive learning in pretrained models for environmental sound classification,” Appl. Acoust., vol. 232, p. 110593, 2025, doi: https://doi.org/10.1016/j.apacoust.2025.110593.

A. Bakhshi, Joaquín García-Gómez, R. Gil-Pita, and S. Chalup, “Violence Detection in Real-Life Audio Signals Using Lightweight Deep Neural Networks,” Procedia computer science, vol. 222, pp. 244–251, Jan. 2023, doi: https://doi.org/10.1016/j.procs.2023.08.162