A Systematic Literature Review on Automated Unit Test Generation Techniques, Algorithms, and Tools

Syed Mukhtar ul Hassan Mukhtar; Muhammad Hammad Ali Hammad; Muhammad Abbas Abbas; Mudassar Adeel Ahmed

doi:10.33411/IJIST/1809

Authors

Syed Mukhtar ul Hassan Mukhtar Capital University of Science and Technology
Muhammad Hammad Ali Hammad Capital University of Science and Technology
Muhammad Abbas Abbas Capital University of Science and Technology
Mudassar Adeel Ahmed Capital University of Science and Technology, Islamabad, Pakistan

DOI:

https://doi.org/10.33411/IJIST/1809

Keywords:

Automated Unit Test Generation, Software Testing, Systematic Literature Review, Natural Language Processing, Large Language Models, Test Verification, Test Oracle Generation, Mutation Testing, Assertion Generation

Abstract

The generation of automated unit tests has advanced rapidly with Natural Language Processing (NLP) and Large Language Model (LLM)-based methods. Nonetheless, the field is still disjointed in terms of techniques, algorithms, tools, and verification practices, making it difficult to determine where significant progress has been made, and what major limitations persist. This paper is a systematic literature review of 48 primary studies on automated unit test generation across four dimensions of analysis: generation techniques, underlying algorithms, tools/frameworks, and verification or post-generation improvement mechanisms. The results indicate that tools and frameworks account for the highest number of studies, with 27 (56.3%), followed by test generation techniques and algorithms with 14 (29.2%), verification and evaluation approaches with 10 (20.8%), and assertion, repair, and test suite improvement mechanisms with 8 (16.7%) studies. Evidence has been reported that the latest LLM-based and hybrid methods have shown better contextual generation and workflow integration, though these methods heavily rely on validation, repair, and quality of oracles. As an example, using mutation-guided prompting has been reported to reach a mutation score of 93.57% on synthetic buggy code and detect up to 28% more faulty human-written snippets, and industrial LLM-based test improvement has reported 75% build success, 57% reliable pass rate, 25% coverage increase, and 73% acceptance by engineers. The review in general indicates that the trend of automated unit test generation is no longer about isolated generation of tests, but rather about the generation of tests that are integrated into quality-engineering pipelines. New developments will not rely as much on the generate raw tests but on integrating generation with verification, assertion enhancement, repair, reproducible evaluation, and deploying these capabilities in industrial settings.

Author Biographies

Syed Mukhtar ul Hassan Mukhtar, Capital University of Science and Technology

Student, Department of Software Engineering

Muhammad Hammad Ali Hammad, Capital University of Science and Technology

Student, Department of Software Engineering

Muhammad Abbas Abbas, Capital University of Science and Technology

Student, Department of Software Engineering

References

P. McMinn, “Search-based software test data generation: A survey,” Softw. Test. Verif. Reliab., vol. 14, no. 2, pp. 105–156, Jun. 2004, doi: 10.1002/STVR.294;ISSUE:ISSUE:DOI.

S. M. Bindu Bhargavi and V. Suma, “A Survey of the Software Test Methods and Identification of Critical Success Factors for Automation,” SN Comput. Sci. 2022 36, vol. 3, no. 6, pp. 449-, Aug. 2022, doi: 10.1007/s42979-022-01297-5.

J. Wang, Y. Huang, C. Chen, Z. Liu, S. Wang, and Q. Wang, “Software Testing With Large Language Models: Survey, Landscape, and Vision,” IEEE Trans. Softw. Eng., vol. 50, no. 4, pp. 911–936, Apr. 2024, doi: 10.1109/TSE.2024.3368208.

Murat Tasarsu, Ahmet Vedat Tokmak & Cagatay Catal, “Test case generation using large language models: a systematic literature review,” Cluster Comput., vol. 29, no. 227, 2026, [Online]. Available: https://link.springer.com/article/10.1007/s10586-026-06021-z

Samar Ali Abdallah, Ramadan Moawad, “An optimization approach for automated unit test generation tools using multi-objective evolutionary algorithms,” Futur. Comput. Informatics J., vol. 3, no. 2, pp. 178–190, 2018, doi: https://doi.org/10.1016/j.fcij.2018.02.004.

Ohood Al-Masri, Wedad Abdulraqeeb Al-Sorori, “Object-Oriented Test Case Generation Using Teaching Learning-Based Optimization (TLBO) Algorithm,” IEEE Access, 2022, doi: 10.1109/ACCESS.2022.3214841.

I. Hajri, A. Goknil, F. Pastore, and L. C. Briand, “Automating system test case classification and prioritization for use case-driven testing in product lines,” Empir. Softw. Eng., vol. 25, no. 5, pp. 3711–3769, Sep. 2020, doi: 10.1007/s10664-020-09853-4.

Yue Wang, Hung Le, Akhilesh Deepak Gotmare, Nghi D.Q. Bui, Junnan Li, Steven C.H. Hoi, “CodeT5+: Open Code Large Language Models for Code Understanding and Generation,” arXiv:2305.07922, 2023, [Online]. Available: https://arxiv.org/abs/2305.07922

G. Yi, Z. Chen, Z. Chen, W. E. Wong, and N. Chau, “Exploring the Capability of ChatGPT in Test Generation,” Proc. - 2023 IEEE 23rd Int. Conf. Softw. Qual. Reliab. Secur. Companion, QRS-C 2023, pp. 72–80, 2023, doi: 10.1109/QRS-C60940.2023.00013.

S. Zekarias Esubalew and B. G. Assefa, “Reimagining Unit Test Generation With AI: A Journey From Evolutionary Models to Transformers,” IEEE Access, vol. 13, pp. 154908–154929, 2025, doi: 10.1109/ACCESS.2025.3597049.

Yutian Tang, Zhijie Liu, Zhichao Zhou, Xiapu Luo, “ChatGPT vs SBST: A Comparative Assessment of Unit Test Suite Generation,” arXiv:2307.00588, 2023, [Online]. Available: https://arxiv.org/abs/2307.00588

“Hybrid Concolic Testing with Large Language Models for Guided Path Exploration.” Accessed: May 05, 2026. [Online]. Available: https://arxiv.org/html/2601.12274v1

Nadia Alshahwan, Jubin Chheda, Anastasia Finegenova, Beliz Gokkaya, “Automated Unit Test Improvement using Large Language Models at Meta,” arXiv:2402.09171, 2024, [Online]. Available: https://arxiv.org/abs/2402.09171

Arghavan Moradi Dakhel, Amin Nikanjam, “Effective test generation using pre-trained Large Language Models and mutation testing,” Inf. Softw. Technol., p. 107468, 2024, doi: https://doi.org/10.1016/j.infsof.2024.107468.

Hao Yu, Yiling Lou, “Automated assertion generation via information retrieval and its integration with deep learning,” Proc. - Int. Conf. Softw. Eng., 2022, [Online]. Available: https://dl.acm.org/doi/10.1145/3510003.3510149

Bhabesh Mali, Karthik Maddala, Vatsal Gupta, Sweeya Reddy, Chandan Karfa, Ramesh Karri, “ChIRAAG: ChatGPT Informed Rapid and Automated Assertion Generation,” arXiv:2402.00093, 2024, [Online]. Available: https://arxiv.org/abs/2402.00093

Masoumeh Taromirad & Per Runeson, “Assertions in software testing: survey, landscape, and trends,” Int. J. Softw. Tools Technol. Transf., vol. 27, 2025, [Online]. Available: https://link.springer.com/article/10.1007/s10009-025-00794-1

Severin Primbs, Benedikt Fein, “AsserT5: Test Assertion Generation Using a Fine-Tuned Code Language Model,” arXiv:2502.02708v1, 2025, [Online]. Available: https://arxiv.org/html/2502.02708v1

Adam Bodicoat, Gunel Jahangirova, Valerio Terragni, “Understanding LLM-Driven Test Oracle Generation,” arXiv:2601.05542, 2026, [Online]. Available: https://arxiv.org/abs/2601.05542

Zhonghao Guo, Sinong Chen, “PC-TRT: A Test Case Reuse and generation Tool to achieve high path coverage for Unit Test,” SoftwareX, vol. 28, p. 101918, 2024, doi: https://doi.org/10.1016/j.softx.2024.101918.

“Mutation Testing Advances: An Analysis and Survey | Request PDF.” Accessed: May 05, 2026. [Online]. Available: https://www.researchgate.net/publication/325014033_Mutation_Testing_Advances_An_Analysis_and_Survey

Jordy Navarro, Ronald Ibarra, “Automatic test case generation using natural language processing: A systematic mapping study,” Inf. Softw. Technol., vol. 189, p. 107929, 2026, doi: https://doi.org/10.1016/j.infsof.2025.107929.

Léuson Da Silva, Paulo Borba, “Detecting semantic conflicts with unit tests,” J. Syst. Softw., vol. 214, p. 112070, 2024, doi: https://doi.org/10.1016/j.jss.2024.112070.

Frank Tip, Jonathan Bell, Max Schaefer, “LLMorpheus: Mutation Testing using Large Language Models,” arXiv:2404.09952, 2025, [Online]. Available: https://arxiv.org/abs/2404.09952

Lin Yang, Chen Yang, “On the Evaluation of Large Language Models in Unit Test Generation,” Proc. - 2024 39th ACM/IEEE Int. Conf. Autom. Softw. Eng. ASE 2024, 2024, [Online]. Available: https://dl.acm.org/doi/10.1145/3691620.3695529

Bernhard K. Aichernig & Klaus Havelund, “AI-Assisted Programming with Test-Based Refinement,” Bridg. Gap Between AI Real., pp. 385–411, 2024, [Online]. Available: https://link.springer.com/chapter/10.1007/978-3-031-73741-1_24

Khashayar Etemadi, Marjan Sirjani, Mahshid Helali Moghadam, Per Strandberg, Paul Pettersson, “LLM-based Property-based Test Generation for Guardrailing Cyber-Physical Systems,” arXiv:2505.23549, vol. 6, 2025, [Online]. Available: https://arxiv.org/abs/2505.23549

M. J. Page et al., “The PRISMA 2020 statement: An updated guideline for reporting systematic reviews,” BMJ, vol. 372, Mar. 2021, doi: 10.1136/BMJ.N71.

Melissa L. Rethlefsen, Shona Kirtley, Siw Waffenschmidt, Ana Patricia Ayala, “PRISMA-S: an extension to the PRISMA Statement for Reporting Literature Searches in Systematic Reviews,” Syst. Rev., vol. 10, no. 39, 2021, [Online]. Available: https://link.springer.com/article/10.1186/s13643-020-01542-z

B. Kitchenham, L. Madeyski and D. Budgen, “SEGRESS: Software Engineering Guidelines for REporting Secondary Studies,” IEEE Trans. Softw. Eng., vol. 49, no. 3, pp. 1273–1298, 2023, doi: 10.1109/TSE.2022.3174092.

“IEEE Xplore.” Accessed: Mar. 17, 2026. [Online]. Available: https://ieeexplore.ieee.org/Xplore/home.jsp

“ACM Digital Library.” Accessed: Mar. 17, 2026. [Online]. Available: https://dl.acm.org/

“ScienceDirect.” Accessed: Jul. 14, 2024. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1877050921014599/pdf?crasolve=1&r=8a3770a69f8a22b6&ts=1721022260273&rtype=https&vrr=UKN&redir=UKN&redir_fr=UKN&redir_arc=UKN&vhash=UKN&host=d3d3LnNjaWVuY2VkaXJlY3QuY29t&tsoh=d3d3LnNjaWVuY2VkaXJlY3QuY29t&rh=d3d3LnNjaWVuY2VkaXJlY3QuY29t&re=X2JsYW5rXw%3D%3D&ns_h=d3d3LnNjaWVuY2VkaXJlY3QuY29t&ns_e=X2JsYW5rXw%3D%3D&rh_fd=rrr)n%5Ed%60i%5E%60_dm%60%5Eo)%5Ejh&tsoh_fd=rrr)n%5Ed%60i%5E%60_dm%60%5Eo)%5Ejh&iv=08f224d777421a845023c0e7a0edff9f&token=61613262373933306138336166376434316331383462323063363931636530623033343830643831366161353965623639333832393864636136363239306262306333666339313530306666666562383564363639373934316263313433376562326537363538303963363737643830323364323364333066336138613861333a383836363532636433623436383833306466393630613637&text=fba922b59a9bed370419e04a1d372945763243e91f8b493c67da9d3c79f7496c4ed77b3ef8c7eeaf4d5c9799d135643e3b7865bd95934282a43b7bae8ca575f5b31e8477cc66c8a8740d97437a3c8fb1fe2d816e372e315c864a995be80e6c5d24b4c2e3c21b7568b4df92d1581edc94f1b5c43125875ad047acef9b23846b20becd6aa3601d24cd451cf7c3eaa8b6cd37e48effd8175f23f440c91ef3ec5b989a5772eb991aa48c28287a9145f20ebb9ce51591627ed3e36002c0419e05bc83d28407c6a81274136401abaf84c36f91de76876cefdf49640d0f6a22a98f63ed979ba82f78ccabf03353f1599d11051aaee6f21a29b48dc102822504447a5bcef0ad59a5bbf0717f8f0b608cabbcfba290c693536799cd65770b4e42b498df263a8ff44638fc09af6f348df0ccd65323&original=3f6d64353d3531313031323761343564313139373039653032663865633437643361363335267069643d312d73322e302d53313837373035303932313031343539392d6d61696e2e706466

“Home | Springer Nature Link.” Accessed: Mar. 17, 2026. [Online]. Available: https://link.springer.com/

Vahid Garousi, Sara Bauer, “NLP-assisted software testing: A systematic mapping of the literature,” Inf. Softw. Technol., vol. 126, p. 106321, 2020, doi: https://doi.org/10.1016/j.infsof.2020.106321.

Vahid Garousi, Michael Felderer, “Guidelines for including grey literature and conducting multivocal literature reviews in software engineering,” Inf. Softw. Technol., vol. 106, pp. 101–121, 2019, doi: https://doi.org/10.1016/j.infsof.2018.09.006.

Muhammad Usman, Nauman bin Ali, Claes Wohlin, “A Quality Assessment Instrument for Systematic Literature Reviews in Software Engineerin,” arXiv:2109.10134, 2021, [Online]. Available: https://arxiv.org/abs/2109.10134

Yinghao Chen, Zehao Hu, “ChatUniTest: A Framework for LLM-Based Test Generation,” FSE Companion - Companion Proc. 32nd ACM Int. Conf. Found. Softw. Eng., 2024, [Online]. Available: https://dl.acm.org/doi/10.1145/3663529.3663801

Dario Olianas, Maurizio Leotta & Filippo Ricca, “MATTER: A tool for generating end-to-end IoT test scripts,” Softw. Qual. J., vol. 30, 2022, [Online]. Available: https://link.springer.com/article/10.1007/s11219-021-09565-y

Mehrdad Abdi, Henrique Rocha, Serge Demeyer & Alexandre Bergel, “Small-Amp: Test amplification in a dynamically typed language,” Empir. Softw. Eng., vol. 27, 2022, [Online]. Available: https://link.springer.com/article/10.1007/s10664-022-10169-8

Pekka Abrahamsson, Tatu Anttila, Jyri Hakala, Juulia Ketola, Anna Knappe, Daniel Lahtinen, Väinö Liukko, Timo Poranen, “ChatGPT as a Fullstack Web Developer - Early Results,” Agil. Process. Softw. Eng. Extrem. Program. – Work., pp. 201–109, 2023, [Online]. Available: https://link.springer.com/chapter/10.1007/978-3-031-48550-3_20

Ciprian Paduraru, Adelina Staicu, “LLM-based methods for the creation of unit tests in game development,” Procedia Comput. Sci., vol. 246, pp. 2459–2468, 2024, doi: https://doi.org/10.1016/j.procs.2024.09.473.

Siqi Gu, Quanjun Zhang, Kecheng Li, Chunrong Fang, Fangyuan Tian, Liuchuan Zhu, Jianyi Zhou, Zhenyu Chen, “TestART: Improving LLM-based Unit Testing via Co-evolution of Automated Generation and Repair Iteration,” arXiv:2408.03095, 2024, [Online]. Available: https://arxiv.org/abs/2408.03095

Juliano Cesar Pancher, Jorge Melegati & Eduardo Martins Guerra, “Exploratory Test-Driven Development Study with ChatGPT in Different Scenarios,” Agil. Process. Softw. Eng. Extrem. Program., 2025, [Online]. Available: https://link.springer.com/chapter/10.1007/978-3-031-94544-1_10

Moritz Mock, Jorge Melegati & Barbara Russo, “Generative AI for Test Driven Development: Preliminary Results,” Agil. Process. Softw. Eng. Extrem. Program. – Work., pp. 24–32, 2025, [Online]. Available:

https://link.springer.com/chapter/10.1007/978-3-031-72781-8_3

Max Schäfer, Sarah Nadi, Aryaz Eghbali, Frank Tip, “An Empirical Evaluation of Using Large Language Models for Automated Unit Test Generation,” arXiv:2302.06527, 2023, [Online]. Available: https://arxiv.org/abs/2302.06527

Saranya Alagarsamy, Chakkrit Tantithamthavorn, “A3Test: Assertion-Augmented Automated Test case generation,” Inf. Softw. Technol., vol. 176, p. 107565, 2024, doi: https://doi.org/10.1016/j.infsof.2024.107565.

J. Park, S. An, D. Youn, G. Kim, and S. Ryu, “JEST: N+1-version differential testing of both javascript engines and specification,” Proc. - Int. Conf. Softw. Eng., pp. 13–24, Nov. 2021, doi: 10.1109/ICSE43902.2021.00015.

Christopher Foster, Abhishek Gulati, “Mutation-Guided LLM-based Test Generation at Meta,” Proc. ACM SIGSOFT Symp. Found. Softw. Eng., 2025, [Online]. Available: https://dl.acm.org/doi/10.1145/3696630.3728544

Shaker Mahmud Khandaker, Fitsum Kifetew, Davide Prandi, Angelo Susi, “AugmenTest: Enhancing Tests with LLM-Driven Oracles,” arXiv:2501.17461, 2025, [Online]. Available: https://arxiv.org/abs/2501.17461

N. Mani and S. Attaranasl, “Adaptive Test Healing using LLM/GPT and Reinforcement Learning,” 2025 IEEE Int. Conf. Softw. Testing, Verif. Valid. Work. ICSTW 2025, pp. 9–16, 2025, doi: 10.1109/ICSTW64639.2025.10962516.

Andrea Lops, Fedelucio Narducci, Azzurra Ragone, Michelantonio Trizio, Claudio Bartolini, “A System for Automated Unit Test Generation Using Large Language Models and Assessment of Generated Test Suites,” arXiv:2408.07846, 2024, [Online]. Available: https://arxiv.org/abs/2408.07846

Zejun Wang, Kaibo Liu, “HITS: High-coverage LLM-based Unit Test Generation via Method Slicing,” Proc. - 2024 39th ACM/IEEE Int. Conf. Autom. Softw. Eng. ASE 2024, 2024, [Online]. Available: https://dl.acm.org/doi/10.1145/3691620.3695501

Hengcheng Zhu, Lili Wei, “StubCoder: Automated Generation and Repair of Stub Code for Mock Objects,” ACM Trans. Softw. Eng. Methodol., 2023, [Online]. Available: https://dl.acm.org/doi/10.1145/3617171

Daniel Planötscher, “NLP and GenAI in Agile Project Management: A Systematic Mapping Study,” Agil. Process. Softw. Eng. Extrem. Program. – Work., pp. 41–49, 2025, [Online]. Available: https://link.springer.com/chapter/10.1007/978-3-032-05799-0_5

Wendkuuni C. Ouedraogo, Kader Kabore, “LLMs and Prompting for Unit Test Generation: A Large-Scale Evaluation,” Proc. - 2024 39th ACM/IEEE Int. Conf. Autom. Softw. Eng. ASE 2024, 2024, [Online]. Available: https://dl.acm.org/doi/10.1145/3691620.3695330

Giovanni Grano, Simone Scalabrino, “An empirical investigation on the readability of manual and generated test cases,” Proc. - Int. Conf. Softw. Eng., 2018, [Online]. Available: https://dl.acm.org/doi/10.1145/3196321.3196363

ZhangJunwei, HuXing, “Automated Unit Test Generation via Chain-of-Thought Prompt and Reinforcement Learning from Coverage Feedback,” ACM Trans. Softw. Eng. Methodol., 2026, [Online]. Available: https://dl.acm.org/doi/10.1145/3745765

Goran Petrovic, Marko Ivankovic, “Practical Mutation Testing at Scale: A view from Google,” IEEE Trans. Softw. Eng., 2022, [Online]. Available: https://dl.acm.org/doi/10.1109/TSE.2021.3107634

Ana B. Sánchez, Pedro Delgado-Pérez, “Mutation testing in the wild: findings from GitHub,” Empir. Softw. Eng., 2022, [Online]. Available: https://dl.acm.org/doi/abs/10.1007/s10664-022-10177-8#:~:text=Mutation testing exploits artificial faults,research events on the topic.

Ana B. Sanchez, Jose A. Parejo, “Mutation Testing in Practice: Insights From Open-Source Software Developers,” IEEE Trans. Softw. Eng., vol. 50, pp. 1130–1143, 2024, doi: 10.1109/TSE.2024.3377378.

Ellen Arteca, Sebastian Harner, “Nessie: automatically testing JavaScript APIs with asynchronous callbacks,” Proc. - Int. Conf. Softw. Eng., 2022, [Online]. Available: https://dl.acm.org/doi/10.1145/3510003.3510106