Dynamic Malware Detection Using Effective Machine Learning Models with Feature Selection Techniques

Inam Ullah Khan; Fida Muhammad Khan; Zeeshan Ali Haider; Saba Khattak; Gulshan Naheed; Sana Shaoor Kiani

Authors

Inam Ullah Khan Department of Computer Science, Qurtuba University of Science & Information, Technology, Peshawar, Pakistan
Fida Muhammad Khan Department of Computer Science, Qurtuba University of Science & Information, Technology, Peshawar, Pakistan
Zeeshan Ali Haider Department of Computer Science, Qurtuba University of Science & Information, Technology, Peshawar, Pakistan.
Saba Khattak Department of Computer Science, University of Science & Technology, Bannu, Pakistan
Gulshan Naheed Higher Education Departments KPK, Peshawar
Sana Shaoor Kiani Ministry of Law & Justice Pakistan

Keywords:

Cyber Security, Machine Learning, Cyber-Attacks, Random Forest, Decision Trees, KNN, Gaussian Naive Bayes (NB), Malicious Threats

Abstract

Dynamic Malware is a type of virus that is self-modifying, which makes it difficult to analyze in the course of its operation. It occasionally changes its behavior based on the existing environment and the context of execution. The goal of this study was to identify and detect dynamic malware in Android devices using effective machine-learning models with feature selection techniques. With new malicious software emerging daily, relying solely on manual heuristic analysis has become ineffective. To address this limitation, the study used dynamic detection methods to detect the events of interest using machine learning models. Some of these measures entailed duplication of an environment in which the behavior of malware could be replicated and then come up with reports. The reports were then transformed into sparse vector models so that other machine-learning techniques could then be applied to them. In this research study seven different models, namely, KNN, DT, RF, AdaBoost, SGD, Extra Trees, and Gaussian NB, were used to train an effective malware detection model to predict the dynamic malware in its early stages. The study showed that Random Forest, Stochastic Gradient Descent, Extra Tree, and Gaussian Naive Bayes classifiers achieved the highest accuracy compared to other models. This research study endorses the application of machine learning-based automated behavior analysis for malware detection, about the complexities involved in the dynamic behavioral analysis of malicious software.

References

M. Alazab, R. A. Khurma, D. Camacho, and A. Martín, “Enhanced Android Ransomware Detection Through Hybrid Simultaneous Swarm-Based Optimization,” Cognit. Comput., pp. 1–15, Jun. 2024, doi: 10.1007/S12559-024-10301-4/METRICS.

M. S. Akhtar and T. Feng, “Detection of Malware by Deep Learning as CNN-LSTM Machine Learning Techniques in Real Time,” Symmetry 2022, Vol. 14, Page 2308, vol. 14, no. 11, p. 2308, Nov. 2022, doi: 10.3390/SYM14112308.

J. Zhang, “DeepMal: A CNN-LSTM model for malware detection based on dynamic semantic behaviours,” Proc. - 2020 Int. Conf. Comput. Inf. Big Data Appl. CIBDA 2020, pp. 313–316, Apr. 2020, doi: 10.1109/CIBDA50819.2020.00077.

M. T. Nguyen, V. H. Nguyen, and N. Shone, “Using deep graph learning to improve dynamic analysis-based malware detection in PE files,” J. Comput. Virol. Hacking Tech., vol. 20, no. 1, pp. 153–172, Mar. 2024, doi: 10.1007/S11416-023-00505-X/METRICS.

S. Hosseini, A. E. Nezhad, and H. Seilani, “Android malware classification using convolutional neural network and LSTM,” J. Comput. Virol. Hacking Tech., vol. 17, no. 4, pp. 307–318, Dec. 2021, doi: 10.1007/S11416-021-00385-Z/METRICS.

D. Čeponis and N. Goranin, “Investigation of Dual-Flow Deep Learning Models LSTM-FCN and GRU-FCN Efficiency against Single-Flow CNN Models for the Host-Based Intrusion and Malware Detection Task on Univariate Times Series Data,” Appl. Sci. 2020, Vol. 10, Page 2373, vol. 10, no. 7, p. 2373, Mar. 2020, doi: 10.3390/APP10072373.

C. Avci, B. Tekinerdogan, and C. Catal, “Analyzing the performance of long short-term memory architectures for malware detection models,” Concurr. Comput. Pract. Exp., vol. 35, no. 6, pp. 1–1, Mar. 2023, doi: 10.1002/CPE.7581.

A. Mahindru and A. L. Sangal, “FSDroid:- A feature selection technique to detect malware from Android using Machine Learning Techniques: FSDroid,” Multimed. Tools Appl., vol. 80, no. 9, pp. 13271–13323, Apr. 2021, doi: 10.1007/S11042-020-10367-W/TABLES/21.

M. Al-Kasassbeh, S. Mohammed, M. Alauthman, and A. Almomani, “Feature Selection Using a Machine Learning to Classify a Malware,” Handb. Comput. Networks Cyber Secur. Princ. Paradig., pp. 889–904, Jan. 2020, doi: 10.1007/978-3-030-22277-2_36.

M. S. Akhtar and T. Feng, “Malware Analysis and Detection Using Machine Learning Algorithms,” Symmetry 2022, Vol. 14, Page 2304, vol. 14, no. 11, p. 2304, Nov. 2022, doi: 10.3390/SYM14112304.

S. Il Bae, G. Bin Lee, and E. G. Im, “Ransomware detection using machine learning algorithms,” Concurr. Comput. Pract. Exp., vol. 32, no. 18, p. e5422, Sep. 2020, doi: 10.1002/CPE.5422.

A. Mahindru and A. L. Sangal, “MLDroid—framework for Android malware detection using machine learning techniques,” Neural Comput. Appl., vol. 33, no. 10, pp. 5183–5240, May 2021, doi: 10.1007/S00521-020-05309-4/TABLES/37.

J. Singh and J. Singh, “A survey on machine learning-based malware detection in executable files,” J. Syst. Archit., vol. 112, p. 101861, Jan. 2021, doi: 10.1016/J.SYSARC.2020.101861.

O. Aslan and A. A. Yilmaz, “A New Malware Classification Framework Based on Deep Learning Algorithms,” IEEE Access, vol. 9, pp. 87936–87951, 2021, doi: 10.1109/ACCESS.2021.3089586.

“Machine Learning for Malware Detection.” Accessed: Sep. 18, 2024. [Online]. Available: https://media.kaspersky.com/en/enterprise-security/Kaspersky-Lab-Whitepaper-Machine-Learning.pdf

P. Singhal and N. Raul, “Malware Detection Module using Machine Learning Algorithms to Assist in Centralized Security in Enterprise Networks,” Int. J. Netw. Secur. Its Appl., vol. 4, no. 1, 2012, doi: 10.5121/ijnsa.2012.4106.

B. Cuan, A. Damien, C. Delaplace, and M. Valois, “Malware Detection in PDF Files using Machine Learning,” pp. 578–585, Sep. 2018, doi: 10.5220/0006884705780585.

S. I. Rimon and M. M. Haque, “Malware Detection and Classification Using Hybrid Machine Learning Algorithm,” Lect. Notes Networks Syst., vol. 569 LNNS, pp. 419–428, 2023, doi: 10.1007/978-3-031-19958-5_39.

X. Jin, X. Xing, H. Elahi, G. Wang, and H. Jiang, “A malware detection approach using malware images and autoencoders,” Proc. - 2020 IEEE 17th Int. Conf. Mob. Ad Hoc Smart Syst. MASS 2020, pp. 631–639, Dec. 2020, doi: 10.1109/MASS50613.2020.00009.

A. A. Darem, F. A. Ghaleb, A. A. Al-Hashmi, J. H. Abawajy, S. M. Alanazi, and A. Y. Al-Rezami, “An Adaptive Behavioral-Based Incremental Batch Learning Malware Variants Detection Model Using Concept Drift Detection and Sequential Deep Learning,” IEEE Access, vol. 9, pp. 97180–97196, 2021, doi: 10.1109/ACCESS.2021.3093366.

T. Huang, R. Zhao, L. Bi, D. Zhang, and C. Lu, “Neural Embedding Singular Value Decomposition for Collaborative Filtering,” IEEE Trans. Neural Networks Learn. Syst., vol. 33, no. 10, pp. 6021–6029, Oct. 2022, doi: 10.1109/TNNLS.2021.3070853.

“Performance Evaluation of Machine Learning Algorithms for Detection and Prevention of Malware Attacks.” Accessed: Sep. 18, 2024. [Online]. Available: https://www.researchgate.net/publication/333004518_Performance_Evaluation_of_Machine_Learning_Algorithms_for_Detection_and_Prevention_of_Malware_Attacks

“Introduction to Simple Imputer Class”, [Online]. Available: https://scikitlearn.org/stable/modules/generated/sklearn.impute. SimpleImputer.html

J. Mcgiff, W. G. Hatcher, J. Nguyen, W. Yu, E. Blasch, and C. Lu, “Towards Multimodal Learning for Android Malware Detection,” 2019 Int. Conf. Comput. Netw. Commun. ICNC 2019, pp. 432–436, Apr. 2019, doi: 10.1109/ICCNC.2019.8685502.

K. Liu, S. Xu, G. Xu, M. Zhang, D. Sun, and H. Liu, “A Review of Android Malware Detection Approaches Based on Machine Learning,” IEEE Access, vol. 8, pp. 124579–124607, 2020, doi: 10.1109/ACCESS.2020.3006143.

U. D. Atmojo, G. Ögmundsdóttir, R. Bejarano, D. Dowling, and V. Vyatkin, “A Digital Twin Model for an Educational Turbocharger Demonstrator,” SSRN Electron. J., Feb. 2022, doi: 10.2139/SSRN.4072612.

J. A. Herrera-Silva and M. Hernández-Álvarez, “Dynamic Feature Dataset for Ransomware Detection Using Machine Learning Algorithms,” Sensors, vol. 23, no. 3, p. 1053, Feb. 2023, doi: 10.3390/S23031053/S1.

S. Saad, W. Briguglio, and H. Elmiligi, “The Curious Case of Machine Learning in Malware Detection,” Int. Conf. Inf. Syst. Secur. Priv., pp. 528–535, 2019, doi: 10.5220/0007470705280535.

M. R. Keyvanpour, M. Barani Shirzad, and F. Heydarian, “Android malware detection applying feature selection techniques and machine learning,” Multimed. Tools Appl., vol. 82, no. 6, pp. 9517–9531, Mar. 2023, doi: 10.1007/S11042-022-13767-2/METRICS.