Improving Software Security Through an LLM-Based Vulnerability Detection Model

Mohsin Sami; Kashif Nasr; Saira Gillani; Rabia Tehseen

Authors

Mohsin Sami Department of Computer Science, University of Central Punjab, Lahore, Pakistan
Kashif Nasr Department of Computer Science, University of Central Punjab, Lahore, Pakistan
Saira Gillani Department of Computer Science, University of Central Punjab, Lahore, Pakistan
Rabia Tehseen Department of Computer Science, University of Central Punjab, Lahore, Pakistan

Keywords:

Software Security, Large Language Models, Vulnerability Detection, Transformer Networks, Code Analysis, Explainable AI, Cybersecurity, Deep Learning, Static Analysis, Big-Vul Dataset

Abstract

The risks to modern digital infrastructures posed by software vulnerabilities are critical and include data breaches, unauthorized access, and losses in revenue. Although traditional static and dynamic analysis tools are effective in discovering vulnerability patterns, they are not able to recognize complex, context-dependent, logic-based, and security-embedded flaws that evolve within software systems. This research offers a Large Language Model-based Vulnerability Detection Model (LLM-VDM) focused on enhancing software security with intelligent, context-aware code analysis. Leveraging transformer-based architecture adapted to the Juliet, Big-Vul, and Devign benchmark datasets to assess the performance and integration of code semantic and code contextualization methods, the proposed model was evaluated. Experimental results demonstrated LLM-VDM’s superiority to both baseline and deep learning competitors SonarQube, Devign, CodeBERT, and CodeT5, attaining 91.2% accuracy, 90.0% F1-score, and 0.94 AUC. Furthermore, the integrated explainability module improves explainability by pinpointing vulnerable code and outlining remediation strategies. The findings showed LLM-based technology provides software developers with more secure, adaptive, explainable, and scalable systems, meeting the needs of contemporary software development.

References

S. Lipner, “Security development lifecycle,” Datenschutz und Datensicherheit - DuD, vol. 34, no. 3, pp. 135–137, Mar. 2010, doi: 10.1007/S11623-010-0021-7.

R. K. Hammond Pearce, Baleegh Ahmad, Benjamin Tan, Brendan Dolan-Gavitt, “Asleep at the Keyboard? Assessing the Security of GitHub Copilot’s Code Contributions,” Proc. - IEEE Symp. Secur. Priv., 2021, doi: https://doi.org/10.48550/arXiv.2108.09293.

M. Omar and H. M. Zangana, “Application of Large Language Models (LLMs) for Software Vulnerability Detection,” Appl. Large Lang. Model. Softw. Vulnerability Detect., pp. 1–516, Jan. 2024, doi: 10.4018/979-8-3693-9311-6.

D. Engler, D. Y. Chen, S. Hallem, A. Chou, and B. Chelf, “Bugs as deviant behavior,” Proc. eighteenth ACM Symp. Oper. Syst. Princ., pp. 57–72, Oct. 2001, doi: 10.1145/502034.502041.

D. Hovemeyer and W. Pugh, “Finding bugs is easy,” ACM SIGPLAN Not., vol. 39, no. 12, pp. 92–106, Dec. 2004, doi: 10.1145/1052883.1052895;SERIALTOPIC:TOPIC:ACM-PUBTYPE.

M. S. L. V. Benjamin Livshits, “Finding security vulnerabilities in java applications with static analysis,” SSYM’05 Proc. 14th Conf. USENIX Secur. Symp., vol. 14, p. 18, 2005, [Online]. Available: https://dl.acm.org/doi/10.5555/1251398.1251416

Cristian Cadar, Daniel Dunbar, “KLEE: unassisted and automatic generation of high-coverage tests for complex systems programs,” OSDI’08 Proc. 8th USENIX Conf. Oper. Syst. Des. Implement., pp. 209–224, 2008, [Online]. Available: https://dl.acm.org/doi/10.5555/1855741.1855756

A. Fioraldi, A. Mantovani, D. Maier, and D. Balzarotti, “Dissecting American Fuzzy Lop - A FuzzBench Evaluation - RCR Report,” ACM Trans. Softw. Eng. Methodol., vol. 32, no. 2, Apr. 2023, doi: 10.1145/3580600;WGROUP:STRING:ACM.

K. Serebryany, “{OSS-Fuzz} - Google’s continuous fuzzing service for open source software,” 2017.

H. J. Zhen Li, Deqing Zou, Xu, Shouhuai Ou, Xinyu, “VulDeePecker: A Deep Learning-Based System for Vulnerability Detection,” Netw. Distrib. Syst. Secur. Symp., 2018, [Online]. Available: https://www.ndss-symposium.org/wp-content/uploads/2018/02/ndss2018_03A-2_Li_paper.pdf

R. Russell et al., “Automated Vulnerability Detection in Source Code Using Deep Representation Learning,” Proc. - 17th IEEE Int. Conf. Mach. Learn. Appl. ICMLA 2018, pp. 757–762, Jul. 2018, doi: 10.1109/ICMLA.2018.00120.

Y. Zhou, S. Liu, J. Siow, X. Du, and Y. Liu, “Devign: Effective Vulnerability Identification by Learning Comprehensive Program Semantics via Graph Neural Networks,” Adv. Neural Inf. Process. Syst., vol. 32, Sep. 2019, doi: 10.48550/arxiv.1909.03496.

J. Fan, Y. Li, S. Wang, and T. N. Nguyen, “A C/C++ Code Vulnerability Dataset with Code Changes and CVE Summaries,” Proc. - 2020 IEEE/ACM 17th Int. Conf. Min. Softw. Repos. MSR 2020, pp. 508–512, Jun. 2020, doi: 10.1145/3379597.3387501.

M. Z. Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, “CodeBERT: A Pre-Trained Model for Programming and Natural Languages,” Find. Assoc. Comput. Linguist. Find. ACL EMNLP 2020, 2020, [Online]. Available: https://aclanthology.org/2020.findings-emnlp.139/

S. C. H. H. Yue Wang, Weishi Wang, Shafiq Joty, “CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation,” EMNLP 2021 - 2021 Conf. Empir. Methods Nat. Lang. Process. Proc., 2021, [Online]. Available: https://aclanthology.org/2021.emnlp-main.685/

C. C. Shuai Lu, Daya Guo, Shuo Ren, Junjie Huang, Alexey Svyatkovskiy, Ambrosio Blanco, “CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation,” Adv. Neural Inf. Process. Syst., 2021, doi: https://doi.org/10.48550/arXiv.2102.04664.

J. T. Mark Chen, “Evaluating Large Language Models Trained on Code,” arXiv:2107.03374, 2021, doi: https://doi.org/10.48550/arXiv.2107.03374 Focus to learn more.

A. M. Vahid Majdinasab, Michael Joshua Bishop, Shawn Rasheed, “Assessing the Security of GitHub Copilot Generated Code -- A Targeted Replication Study,” Proc. - 2024 IEEE Int. Conf. Softw. Anal. Evol. Reengineering, SANER 2024, 2023, doi: https://doi.org/10.48550/arXiv.2311.11177.

H. Hanif and S. Maffeis, “VulBERTa: Simplified Source Code Pre-Training for Vulnerability Detection,” Proc. Int. Jt. Conf. Neural Networks, 2022, doi: 10.1109/IJCNN55064.2022.9892280.

S. C. Xin Zhou, “Large Language Model for Vulnerability Detection and Repair: Literature Review and the Road Ahead,” ACM Trans. Softw. Eng. Methodol., vol. 34, no. 5, pp. 1–31, 2025, doi: https://doi.org/10.1145/3708522.

T. Boland and P. E. Black, “The Juliet 1.1 C/C++ and Java Test Suite,” Computer (Long. Beach. Calif)., vol. 45, no. 10, pp. 88–90, 2012, doi: 10.1109/MC.2012.345.

D. W. Yangruibo Ding, Yanjun Fu, Omniyyah Ibrahim, Chawin Sitawarin, Xinyun Chen, Basel Alomair, “Vulnerability Detection with Code Language Models: How Far Are We?,” Proc. - Int. Conf. Softw. Eng., vol. 3, 2024, doi: https://doi.org/10.48550/arXiv.2403.18624.