AI-Driven Software Testing Optimization: A Comprehensive Machine Learning Framework for Automated Test Generation and Oracle Problem Resolution

Zoobia; Aamna .; Wajiha; Qurat Ul Ain

doi:10.33411/IJIST/1799

Authors

Zoobia Department of Software Engineering, Bahria University, Karachi Campus
Aamna . Department of Software Engineering, Bahria University, Karachi Campus
Wajiha Department of Software Engineering, Bahria University, Karachi Campus
Qurat Ul Ain Department of Computer Science, National University of Modern Languages, Islamabad, Pakistan

DOI:

https://doi.org/10.33411/IJIST/1799

Keywords:

Software Testing, Machine Learning, Test Optimization, Oracle Problem, Automated Testing, Reinforcement Learning

Abstract

Software testing is often one of the most expensive and time-consuming phases in software development, and it becomes even harder to manage in fast and continuous development environments. This study presents an AI-driven software testing framework that combines supervised, unsupervised, and reinforcement learning to improve automated test generation, fault prediction, test prioritization, and test oracle decision-making. The proposed framework is designed not only to improve testing efficiency but also to address the test oracle problem, which involves determining expected outputs for generated test cases. For this purpose, it integrates statistical anomaly detection, metamorphic relation discovery, and behavioral pattern recognition within a single architecture. The model was evaluated on 15 open-source projects and 3 industrial case studies to check its practical usefulness in different testing scenarios. Experimental results show that the framework reduces overall testing time by approximately 34.7% while maintaining a fault detection accuracy of 97.8%. In addition, the proposed oracle mechanism achieved around 94.2% accuracy in identifying test failures, and the reinforcement learning module improved test prioritization effectiveness by approximately 28% compared to baseline methods. These results indicating that hybrid learning approaches can improve adaptability and effectiveness in software quality assurance tasks in one framework can provide a more adaptive and effective solution for modern software quality assurance, although further improvements are required for highly dynamic systems, especially for highly dynamic systems.

References

G. Myers, C. Sandler, and T. Badgett, “The Art of Software Testing, 3rd Edition,” p. 240, 2011, Accessed: Apr. 22, 2026. [Online]. Available: https://books.google.com/books/about/The_Art_of_Software_Testing.html?id=GjyEFPkMCwcC

Elaine J. Weyuker, “On Testing Non-Testable Programs,” Comput. J., vol. 25, no. 4, 1982, doi: 10.1093/comjnl/25.4.465.

J. Humble and D. Farley, “Continuous Delivery: Reliable Software Releases through Build, Test, and Deployment Automation,” Contin. Deliv., p. 497, 2010.

“Artificial Intelligence: A Modern Approach, 4th US ed.” Accessed: Mar. 17, 2026. [Online]. Available: https://aima.cs.berkeley.edu/

E. T. Barr, M. Harman, P. McMinn, M. Shahbaz and S. Yoo, “The Oracle Problem in Software Testing: A Survey,” IEEE Trans. Softw. Eng., vol. 41, no. 5, pp. 507–525, 2015, doi: 10.1109/TSE.2014.2372785.

Saswat Anand, Edmund K. Burke, “An orchestrated survey of methodologies for automated software test case generation,” J. Syst. Softw., vol. 86, no. 6, pp. 1978–2001, 2013, doi: https://doi.org/10.1016/j.jss.2013.02.061.

G. Rothermel, R. H. Untcn, C. Chu, and M. J. Harrold, “Prioritizing test cases for regression testing,” IEEE Trans. Softw. Eng., vol. 27, no. 10, pp. 929–948, Oct. 2001, doi: 10.1109/32.962562.

S. Yoo, M. Harman, “Regression testing minimization, selection and prioritization: a survey,” Softw. Testing, Verif. Reliab., vol. 22, no. 2, pp. 67–120, 2012, [Online]. Available: https://dl.acm.org/doi/abs/10.1002/stv.430

Q. U. Ain, T. Rana, and Aamana, “A Study on Identifying, Categorizing and Reporting Usability Bugs and Challenges,” 4th Int. Conf. Commun. Technol. ComTech 2023, pp. 53–68, 2023, doi: 10.1109/COMTECH57708.2023.10165169.

Qurat ul Ain Raja, TAUSEEF RANA, “Devising a Usability Development Life Cycle (UDLC) Model for Enhancing Usability and User Experience in Interactive Applications,” Sir Syed Univ. Res. J. Eng. Technol., vol. 12, no. 2, pp. 81–94, 2022, doi: 10.33317/ssurj.475.

Yan Xiao, Xinyue Zuo, Lei Xue, Kailong Wang, Jin Song Dong, Ivan Beschastnikh, “Empirical Study on Transformer-based Techniques for Software Engineering,” arXiv:2310.00399, 2023, [Online]. Available: https://arxiv.org/abs/2310.00399

Miguel Angel Johansson, “Hybrid Cloud-AI Model using Oracle, Convolutional Neural Networks, and Large Language Models for Automated Healthcare Application,” Int. J. Eng. Ext. Technol. Res., vol. 7, no. 6, 2025, [Online]. Available: https://www.ijeetr.com/index.php/ijeetr/article/view/108

Aamana, Q. U. Ain, and S. U. Nisa, “Beyond Agile: NLP-Driven Quality Attributes Retrieval Using ChatGPT in Software Development Strategies,” Proc. - 2024 Int. Conf. Eng. Comput. ICECT 2024, 2024, doi: 10.1109/ICECT61618.2024.10581306.

Lionel C. Briand, Jürgen Wüst, “Exploring the relationships between design measures and software quality in object-oriented systems,” J. Syst. Softw., vol. 51, no. 3, pp. 245–273, 2000, doi: https://doi.org/10.1016/S0164-1212(99)00102-8.

V. H. S. Durelli et al., “Machine learning applied to software testing: A systematic mapping study,” IEEE Trans. Reliab., vol. 68, no. 3, pp. 1189–1212, Sep. 2019, doi: 10.1109/TR.2019.2892517.

A. Shi, A. Gyori, O. Legunsen, and D. Marinov, “Detecting Assumptions on Deterministic Implementations of Non-deterministic Specifications,” Proc. - 2016 IEEE Int. Conf. Softw. Testing, Verif. Validation, ICST 2016, pp. 80–90, Jul. 2016, doi: 10.1109/ICST.2016.40.

Ramakrihnan Shankar, Devarajan Sridhar, “An Improved Deep Learning Based Test Case Prioritization Using Deep Reinforcement Learning,” Int. J. Intell. Eng. Syst., vol. 17, pp. 771–782, 2024, doi: 10.22266/ijies2024.0229.64.

J. Wang, Y. Huang, C. Chen, Z. Liu, S. Wang, and Q. Wang, “Software Testing With Large Language Models: Survey, Landscape, and Vision,” IEEE Trans. Softw. Eng., vol. 50, no. 4, pp. 911–936, Apr. 2024, doi: 10.1109/TSE.2024.3368208.

Gordon Fraser, Andrea Arcuri, “EvoSuite: automatic test suite generation for object-oriented software,” SIGSOFT/FSE 2011 - Proc. 19th ACM SIGSOFT Symp. Found. Softw. Eng., 2011, [Online]. Available: https://dl.acm.org/doi/abs/10.1145/2025113.2025179

M. Harman, Y. Jia, and Y. Zhang, “Achievements, open problems and challenges for search based software testing,” 2015 IEEE 8th Int. Conf. Softw. Testing, Verif. Validation, ICST 2015 - Proc., May 2015, doi: 10.1109/ICST.2015.7102580.

Helge Spieker, Arnaud Gotlieb, Dusica Marijan, Morten Mossige, “Reinforcement Learning for Automatic Test Case Prioritization and Selection in Continuous Integration,” arXiv:1811.04122, 2018, [Online]. Available: https://arxiv.org/abs/1811.04122

T. Menzies, J. Greenwald, and A. Frank, “Data mining static code attributes to learn defect predictors,” IEEE Trans. Softw. Eng., vol. 33, no. 1, pp. 2–13, Jan. 2007, doi: 10.1109/TSE.2007.256941.

“Understanding how we test samples for infection in our laboratories | Great Ormond Street Hospital.” Accessed: Apr. 22, 2026. [Online]. Available: https://www.gosh.nhs.uk/conditions-and-treatments/procedures-and-treatments/understanding-how-we-test-samples-infection-our-laboratories/

A. Groce et al., “You are the only possible oracle: Effective test selection for end users of interactive machine learning systems,” IEEE Trans. Softw. Eng., vol. 40, no. 3, pp. 307–323, 2014, doi: 10.1109/TSE.2013.59.

T.Y. Chen, S.C. Cheung, S.M. Yiu, “Metamorphic Testing: A New Approach for Generating Next Test Cases,” arXiv:2002.12543, 2020, [Online]. Available: https://arxiv.org/abs/2002.12543

Yuchi Tian, Kexin Pei, Suman Jana, Baishakhi Ray, “DeepTest: Automated Testing of Deep-Neural-Network-driven Autonomous Cars,” arXiv:1708.08559, 2018, [Online]. Available: https://arxiv.org/abs/1708.08559

T. Xie, N. Tillmann, J. De Halleux, and W. Schulte, “Fitness-guided path exploration in dynamic symbolic execution,” Proc. Int. Conf. Dependable Syst. Networks, pp. 359–368, 2009, doi: 10.1109/DSN.2009.5270315.

Benjamin Steenhoek, Md Mahbubur Rahman, Richard Jiles, Wei Le, “An Empirical Study of Deep Learning Models for Vulnerability Detection,” arXiv:2212.08109, 2023, [Online]. Available: https://arxiv.org/abs/2212.08109

Maaz Bin Safeer Ahmad, Alvin Cheung, “Automatically Leveraging MapReduce Frameworks for Data-Intensive Applications,” Assoc. Comput. Mach., 2018, [Online]. Available: https://arxiv.org/pdf/1801.09802

Antonia Bertolino, Antonio Guerriero, “Learning-to-rank vs ranking-to-learn: strategies for regression testing in continuous integration,” Proc. - Int. Conf. Softw. Eng., 2020, [Online]. Available: https://dl.acm.org/doi/10.1145/3377811.3380369

“(PDF) Effective Test Data Generation using Genetic Algorithms.” Accessed: Apr. 22, 2026. [Online]. Available: https://www.researchgate.net/publication/269961270_Effective_Test_Data_Generation_using_Genetic_Algorithms

Xiaoyuan Xie, Pengbo Yin, “Boosting the Revealing of Detected Violations in Deep Learning Testing: A Diversity-Guided Method,” ACM Int. Conf. Proceeding Ser., 2023, [Online]. Available: https://dl.acm.org/doi/10.1145/3551349.3556919

Yi Li, Shaohua Wang, Tien N. Nguyen, “Fault Localization with Code Coverage Representation Learning,” arXiv:2103.00270, 2021, [Online]. Available: https://arxiv.org/abs/2103.00270

Q. Zhang and J. Luo, “Automated simulation testing for complex software environments using multi-agent reinforcement learning,” Int. J. Simul. Process Model., vol. 23, no. 1, pp. 1–10, 2026, doi: 10.1504/IJSPM.2026.152088.