Analyzing ML-Based IDS over Real-Traffic


  • Shafqat Ali Siyyal Department of Telecommunication, Mehran University of Engineering and Technology Jamshoro, Pakistan.
  • Faheem Yar Khuawar Department of Telecommunication, Mehran University of Engineering and Technology Jamshoro, Pakistan.
  • Erum Saba Information Technology Center, Sindh Agriculture University, Tandojam, Pakistan
  • Abdul Latif Memon Department of Telecommunication, Mehran University of Engineering and Technology Jamshoro, Pakistan.
  • Muhammad Raza Shaikh Department of Telecommunication, Mehran University of Engineering and Technology Jamshoro, Pakistan.


ML, IDS, Dataset, Dataset Generation method, Traffic Capture, Normal & Attack Traffic


The rapid growth of computer networks has caused a significant increase in malicious traffic, promoting the use of Intrusion Detection Systems (IDSs) to protect against this ever-growing attack traffic. A great number of IDS have been developed with some sort of weaknesses and strengths. Most of the development and research of IDS is purely based on simulated and non-updated datasets due to the unavailability of real datasets, for instance, KDD '99, and CIC-IDS-18 which are widely used datasets by researchers are not sufficient to represent real-traffic scenarios. Moreover, these one-time generated static datasets cannot survive the rapid changes in network patterns. To overcome these problems, we have proposed a framework to generate a full feature, unbiased, real-traffic-based, updated custom dataset to deal with the limitations of existing datasets. In this paper, the complete methodology of network testbed, data acquisition and attack scenarios are discussed. The generated dataset contains more than 70 features and covers different types of attacks, namely DoS, DDoS, Portscan, Brute-Force and Web attacks. Later, the custom-generated dataset is compared to various available datasets based on seven different factors, such as updates, practical-to-generate, realness, attack diversity, flexibility, availability, and interoperability. Additionally, we have trained different ML-based classifiers on our custom-generated dataset and then tested/analyzed it based on performance metrics. The generated dataset is publicly available and accessible by all users.  Moreover, the following research is anticipated to allow researchers to develop effective IDSs and real traffic-based updated datasets.


M. Ring, S. Wunderlich, D. Scheuring, D. Landes, and A. Hotho, “A survey of network-based intrusion detection data sets,” Comput. Secur., vol. 86, pp. 147–167, 2019, doi: 10.1016/j.cose.2019.06.005.

M. Al-kasassbeh, G. Al-naymat, and E. Al-hawari, “Towards Generating Realistic SNMP-MIB Dataset for Network Anomaly Detection,” Int. J. Comput. Sci. Inf. Secur., vol. 14, no. December, p. 1162, 2016.

V. R. Varanasi and S. Razia, “Intrusion Detection using Machine Learning and Deep Learning,” Int. J. Recent Technol. Eng., vol. 8, no. 4, pp. 9704–9719, 2019, doi: 10.35940/ijrte.d9999.118419.

A. Chadd, “DDoS attacks: past, present and future,” Netw. Secur., vol. 2018, no. 7, pp. 13–15, 2018.

S. Wankhede and D. Kshirsagar, “DoS attack detection using machine learning and neural network,” in 2018 Fourth International Conference on Computing Communication Control and Automation (ICCUBEA), 2018, pp. 1–5.

Q. A. Al-Haija, E. Saleh, and M. Alnabhan, “Detecting Port Scan Attacks Using Logistic Regression,” in 2021 4th International Symposium on Advanced Electrical and Communication Technologies (ISAECT), 2021, pp. 1–5.

K. Trieu and Y. Yang, “Artificial intelligence-based password brute force attacks,” 2018.

R. Singh, H. Kumar, R. K. Singla, and R. R. Ketti, “Internet attacks and intrusion detection system: A review of the literature,” Online Inf. Rev., 2017.

S. V. M. Vishwanathan and M. N. Murty, “SSVM: a simple SVM algorithm,” in Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN’02 (Cat. No. 02CH37290), 2002, vol. 3, pp. 2393–2398.

P. H. Swain and H. Hauska, “The decision tree classifier: Design and potential,” IEEE Trans. Geosci. Electron., vol. 15, no. 3, pp. 142–147, 1977.

I. Rish and others, “An empirical study of the naive Bayes classifier,” in IJCAI 2001 workshop on empirical methods in artificial intelligence, 2001, vol. 3, no. 22, pp. 41–46.

I. Sharafaldin, A. Gharib, A. H. Lashkari, and A. A. Ghorbani, “Towards a reliable intrusion detection benchmark dataset,” Softw. Netw., vol. 2018, no. 1, pp. 177–200, 2018.

J. McHugh, “Testing intrusion detection systems: a critique of the 1998 and 1999 darpa intrusion detection system evaluations as performed by lincoln laboratory,” ACM Trans. Inf. Syst. Secur., vol. 3, no. 4, pp. 262–294, 2000.

S. Hossen and A. Janagam, “Analysis of network intrusion detection system with machine learning algorithms ( deep reinforcement learning Algorithm ),” no. October, pp. 1–63, 2018.

R. P. Lippmann et al., “Evaluating intrusion detection systems: The 1998 DARPA off-line intrusion detection evaluation,” in Proceedings DARPA Information Survivability Conference and Exposition. DISCEX’00, 2000, vol. 2, pp. 12–26.

K. Siddique, Z. Akhtar, F. Aslam Khan, and Y. Kim, “KDD Cup 99 Data Sets: A Perspective on the Role of Data Sets in Network Intrusion Detection Research,” Computer (Long. Beach. Calif)., vol. 52, no. 2, pp. 41–51, Feb. 2019, doi: 10.1109/MC.2018.2888764.

A. Mishra and P. Yadav, “Anomaly-based IDS to detect attack using various artificial intelligence machine learning algorithms: A review,” 2nd Int. Conf. Data, Eng. Appl. IDEA 2020, 2020, doi: 10.1109/IDEA49133.2020.9170674.

R. Chitrakar and C. Huang, “Anomaly based intrusion detection using hybrid learning approach of combining k-medoids clustering and naive bayes classification,” in 2012 8th International Conference on Wireless Communications, Networking and Mobile Computing, 2012, pp. 1–5.

M. Al-Fawa’reh and M. Al-Fayoumiy, “Detecting stealth-based attacks in large campus networks,” Int. J. Adv. Trends Comput. Sci. Eng., vol. 9, no. 4, pp. 4262–4277, 2020, doi: 10.30534/ijatcse/2020/15942020.

A. Shiravi, H. Shiravi, M. Tavallaee, and A. A. Ghorbani, “Toward developing a systematic approach to generate benchmark datasets for intrusion detection,” Comput. & Secur., vol. 31, no. 3, pp. 357–374, 2012.

M. H. Abdulraheem and N. B. Ibraheem, “A detailed analysis of new intrusion detection dataset,” J. Theor. Appl. Inf. Technol., vol. 97, no. 17, pp. 4519–4537, 2019.

N. Moustafa and J. Slay, “UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set),” in 2015 Military Communications and Information Systems Conference (MilCIS), 2015, pp. 1–6, doi: 10.1109/MilCIS.2015.7348942.

A. Thakkar and R. Lohiya, “A Review of the Advancement in Intrusion Detection Datasets,” Procedia Comput. Sci., vol. 167, no. 2019, pp. 636–645, 2020, doi: 10.1016/j.procs.2020.03.330.

M. Ring, S. Wunderlich, D. Scheuring, D. Landes, and A. Hotho, “A survey of network-based intrusion detection data sets,” Comput. & Secur., vol. 86, pp. 147–167, 2019.

D. Stiawan, M. Y. Bin Idris, A. M. Bamhdi, R. Budiarto, and others, “CICIDS-2017 dataset feature analysis with information gain for anomaly detection,” IEEE Access, vol. 8, pp. 132911–132921, 2020.

Z. K. Maseer, R. Yusof, N. Bahaman, S. A. Mostafa, and C. F. M. Foozy, “Benchmarking of machine learning for anomaly based intrusion detection systems in the CICIDS2017 dataset,” IEEE access, vol. 9, pp. 22351–22370, 2021.

C. G. Cordero, E. Vasilomanolakis, A. Wainakh, M. Mühlhäuser, and S. Nadjm-Tehrani, “On generating network traffic datasets with synthetic attacks for intrusion detection,” vol. 0, no. 0, 2019.

G. Brogi and G. Brogi, “Sharing and replaying attack scenarios with Moirai To cite this version :,” no. June, 2017.

“Wireshark.” .

“Kali LINUX.” .

“Metasploitable.” .

“VirtualBox.” .


“GoldenEye.” .

“Hulk DoS Attack.” .



“LOIC.” .

R. Damasevicius et al., “Litnet-2020: An annotated real-world network flow dataset for network intrusion detection,” Electron., vol. 9, no. 5, 2020, doi: 10.3390/electronics9050800.

“Patator.” .

“Nmap.” .

PortSwigger Ltd, “Burp Suite,” 2015. .

“CIC-Flow-Meter.” .

S. M. Kasongo and Y. Sun, “Performance analysis of intrusion detection systems using a feature selection method on the UNSW-NB15 dataset,” J. Big Data, vol. 7, no. 1, pp. 1–20, 2020.

I. F. Kilincer, F. Ertam, and A. Sengur, “Machine learning methods for cyber security intrusion detection: Datasets and comparative study,” Comput. Networks, vol. 188, p. 107840, 2021.

C. J. Ugochukwu, E. O. Bennett, and P. Harcourt, An intrusion detection system using machine learning algorithm. LAP LAMBERT Academic Publishing, 2019.

A. Ahmim, L. Maglaras, M. A. Ferrag, M. Derdour, and H. Janicke, “A novel hierarchical intrusion detection system based on decision tree and rules-based models,” in 2019 15th International Conference on Distributed Computing in Sensor Systems (DCOSS), 2019, pp. 228–233.

A. Divekar, M. Parekh, V. Savla, R. Mishra, and M. Shirole, “Benchmarking datasets for Anomaly-based Network Intrusion Detection: KDD CUP 99 alternatives,” in 2018 IEEE 3rd International Conference on Computing, Communication and Security (ICCCS), Oct. 2018, pp. 1–8, doi: 10.1109/CCCS.2018.8586840.

S. Zwane, P. Tarwireyi, and M. Adigun, “Performance analysis of machine learning classifiers for intrusion detection,” 2018 Int. Conf. Intell. Innov. Comput. Appl. ICONIC 2018, pp. 1–5, 2019, doi: 10.1109/ICONIC.2018.8601203.

S. A. Siyyal, “Custom Generated IDS Dataset,” 2022. .

Abdul Malik, & Muhammad Shumail Naveed. (2022). Analysis of Code Vulnerabilities in Repositories of GitHub and Rosettacode: A comparative Study. International Journal of Innovations in Science & Technology, 4(2), 499–511. Retrieved from

Khan, M. I., Imran, A., Butt, A. H., & Butt, A. U. R. . (2021). Activity Detection of Elderly People Using Smartphone Accelerometer and Machine Learning Methods. International Journal of Innovations in Science & Technology, 3(4), 186–197. Retrieved from

Muhammad Asad Arshed, Jabbar, M. A. ., Liaquat, F., Chaudhary, U. M.- ud-D. ., Karim, D. ., Alam, H. ., & Mumtaz, S. . (2022). Machine Learning with Data Balancing Technique for IoT Attack and Anomalies Detection. International Journal of Innovations in Science & Technology, 4(2), 490–498. Retrieved from

Malik, Z. A., Siddique, M. ., Zahir Javed Paracha, Imran, A., Yasin, A., & Butt, A. H. (2022). Performance Evaluation of Classification Algorithms for Intrusion Detection on NSL-KDD Using Rapid Miner . International Journal of Innovations in Science & Technology, 4(1), 135–146. Retrieved from

Farman Hassan, Muhammad Hamza Mehmood, Babar Younis, Nasir Mehmood, Talha Imran, & Usama Zafar. (2022). Comparative Analysis of Machine Learning Algorithms for Classification of Environmental Sounds and Fall Detection. International Journal of Innovations in Science & Technology, 4(1), 163–174. Retrieved from

Irfan Qutab, Malik, K. I., & Hira Arooj. (2022). Sentiment Classification Using Multinomial Logistic Regression on Roman Urdu Text. International Journal of Innovations in Science & Technology, 4(2), 323–335. Retrieved from

Shahrukh Hussain, Usama Munir, & Chaudhry, . M. S. (2022). Visualizing Impact of Weather on Traffic Congestion Prediction: A Quantitative Study. International Journal of Innovations in Science & Technology, 3(4), 210–222. Retrieved from

Asad Ur Rehman, Madiha Liaqat, Ali Javeed, & Farman Hassan. (2022). HealthConsultantBot: Primary Health Care Monitoring Chatbot for Disease Prediction. International Journal of Innovations in Science & Technology, 4(1), 201–212. Retrieved from

Sohail Manzoor, Huma Qayyum, Farman Hassan, Asad Ullah, Ali Nawaz, & Auliya Ur Rahman. (2022). Melanoma Detection Using a Deep Learning Approach. International Journal of Innovations in Science & Technology, 4(1), 222–232. Retrieved from




How to Cite

Siyyal, S. A., Faheem Yar Khuawar, Erum Saba, Abdul Latif Memon, & Muhammad Raza Shaikh. (2022). Analyzing ML-Based IDS over Real-Traffic. International Journal of Innovations in Science & Technology, 4(3), 621–640. Retrieved from