Ensemble Learning Approach for Multi-Class Intrusion Detection in IoT Networks Using CIC-IoT-2023 Dataset
Keywords:
CIC-IoT-2023, Class Imbalance, Ensemble Learning, Feature Selection, IoT Security, Intrusion Detection, LightGBM, Multi-Class IDS, Raspberry Pi DeploymentAbstract
IoT devices are everywhere: smart doorbells, factory sensors, and even hospital pumps. Yet the $20 chips inside often ship with decade-old firmware and unchanged passwords. Attacks have tripled in three years, and DDoS floods are now routine. Legacy intrusion-detection systems drown in millisecond burst traffic. Machine learning promises relief, but CIC-IoT-2023 creates significant fragmentation challenges for researchers. The dataset holds 46 million flows across 34 attack types with a 408:1 imbalance ratio. Some classes are scarcer than 0.01%, and standard classifiers tend to over-predict ‘DDoS’ while ignoring the slow surgical threats. We address this through disciplined data preparation. First, we collapse 34 labels into eight behaviorally coherent families. Second, we prune 46 features to 30 using correlation analysis and Random Forest Gini importance. Third, we benchmark seven models comprising five base learners and two ensembles on identical stratified splits without synthetic data or a GPU. The soft-voting ensemble peaks at 99.37% macro-F1 with 3.9 microseconds inference, achieving real-time performance on Raspberry Pi 4. LightGBM delivers 99.03% F1 in 82 s training, trading a 0.34% decrease in accuracy for a 5× speed improvement. We release the 30-feature extractor, stratified splits, and training scripts for community benchmarking.
References
F. A. Alaba, M. Othman, I. A. T. Hashem, and F. Alotaibi, “Internet of Things security: A survey,” J. Netw. Comput. Appl., vol. 88, pp. 10–28, Jun. 2017, doi: 10.1016/J.JNCA.2017.04.002.
Akash Dogra, “CIC IoT dataset 2023,” Kaggle, 2023, [Online]. Available: https://www.kaggle.com/datasets/akashdogra/cic-iot-2023
A. Thakkar and R. Lohiya, “A Review on Machine Learning and Deep Learning Perspectives of IDS for IoT: Recent Updates, Security Issues, and Challenges,” Arch. Comput. Methods Eng., vol. 28, no. 4, pp. 3211–3243, Jun. 2021, doi: 10.1007/s11831-020-09496-0.
Hongyu Liu, Bo Lang, “Machine Learning and Deep Learning Methods for Intrusion Detection Systems: A Survey,” Appl. Sci., vol. 9, no. 20, p. 4396, 2019, [Online]. Available: https://www.mdpi.com/2076-3417/9/20/4396
Mohamed Amine Ferrag, Leandros Maglaras, “Deep learning for cyber security intrusion detection: Approaches, datasets, and comparative study,” J. Inf. Secur. Appl., vol. 50, p. 102419, 2020, [Online]. Available: https://www.sciencedirect.com/science/article/abs/pii/S2214212619305046
I. Sharafaldin, A. H. Lashkari, and A. A. Ghorbani, “Toward generating a new intrusion detection dataset and intrusion traffic characterization,” ICISSP 2018 - Proc. 4th Int. Conf. Inf. Syst. Secur. Priv., vol. 2018-January, pp. 108–116, 2018, doi: 10.5220/0006639801080116.
Tianqi Chen, Carlos Guestrin, “XGBoost: A Scalable Tree Boosting System,” Proc. ACM SIGKDD Int. Conf. Knowl. Discov. Data Min., 2016, [Online]. Available: https://dl.acm.org/doi/10.1145/2939672.2939785
Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, “LightGBM: A Highly Efficient Gradient Boosting Decision Tree,” 31st Conf. Neural Inf. Process. Syst., 2017, [Online]. Available: https://proceedings.neurips.cc/paper_files/paper/2017/file/
f44a102fde848669bdd9eb6b76fa-Paper.pdf
Liudmila Prokhorenkova, Gleb Gusev, Aleksandr Vorobev, Anna Veronika Dorogush, Andrey Gulin, “CatBoost: unbiased boosting with categorical features,” arXiv:1706.09516, 2017, [Online]. Available: https://arxiv.org/abs/1706.09516
L. Breiman, “Random forests,” Mach. Learn., vol. 45, no. 1, pp. 5–32, Oct. 2001, doi: 10.1023/A:1010933404324/METRICS.
N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: Synthetic Minority Over-sampling Technique,” J. Artif. Intell. Res., vol. 16, pp. 321–357, Jun. 2002, doi: 10.1613/JAIR.953.
R. Vinayakumar, M. Alazab, K. P. Soman, P. Poornachandran, A. Al-Nemrat, “Deep Learning Approach for Intelligent Intrusion Detection System,” IEEE Access, vol. 7, 2019, [Online]. Available: https://ieeexplore.ieee.org/document/8681044
M. Lopez-Martin, B. Carro, A. Sanchez-Esguevillas, “Network Traffic Classifier With Convolutional and Recurrent Neural Networks for Internet of Things,” IEEE Access, vol. 5, 2017, [Online]. Available: https://ieeexplore.ieee.org/document/8026581
Y. Meidan et al., “N-BaIoT-Network-based detection of IoT botnet attacks using deep autoencoders,” IEEE Pervasive Comput., vol. 17, no. 3, pp. 12–22, Jul. 2018, doi: 10.1109/MPRV.2018.03367731.
M. Z. Alom, V. Bontupalli, and T. M. Taha, “Intrusion detection using deep belief networks,” Proc. IEEE Natl. Aerosp. Electron. Conf. NAECON, vol. 2016-March, pp. 339–344, Mar. 2016, doi: 10.1109/NAECON.2015.7443094.
S. Rezvy, Y. Luo, M. Petridis, A. Lasebae, and T. Zebin, “An efficient deep learning model for intrusion classification and prediction in 5G and IoT networks,” 2019 53rd Annu. Conf. Inf. Sci. Syst. CISS 2019, Apr. 2019, doi: 10.1109/CISS.2019.8693059.
David H. Wolpert, “Stacked generalization,” Neural Networks, vol. 5, no. 2, 1992, [Online]. Available: https://www.sciencedirect.com/science/article/abs/pii/
S0893608005800231
Bayu Adhi Tama, Kyung Hyune Rhee, “An Integration of PSO-based Feature Selection and Random Forest for Anomaly Detection in IoT Network,” MATEC Web Conf., vol. 159, 2018, [Online]. Available: https://www.researchgate.net/
publication/324109443_An_Integration_of_PSO-based_Feature_Selection_and_
Random_Forest_for_Anomaly_Detection_in_IoT_Network
Ansam Khraisat, Iqbal Gondal, Peter Vamplew, “Survey of intrusion detection systems: techniques, datasets and challenges,” Cybersecurity, vol. 2, no. 20, 2019, [Online]. Available: https://link.springer.com/article/10.1186/s42400-019-0038-7
V. C. Riccardo Lazzarini, Huaglory Tianfield, “Federated Learning for IoT Intrusion Detection,” AI, vol. 4, no. 3, pp. 509–530, 2023, doi: https://doi.org/10.3390/ai4030028.
R. Sommer and V. Paxson, “Outside the closed world: On using machine learning for network intrusion detection,” Proc. - IEEE Symp. Secur. Priv., pp. 305–316, 2010, doi: 10.1109/SP.2010.25.
Jonathan Lundqvist, Anel Hadzic, “Lightweight Machine Learning Models for Intrusion Detection on IoT Devices,” Nor. IKT-konferanse Forsk. og utdanning, vol. 37, no. 3, 2025, doi: 10.5324/jrxdjb92.
Markus Ring, Sarah Wunderlich, Deniz Scheuring, Dieter Landes, Andreas Hotho, “A survey of network-based intrusion detection data sets,” Comput. Secur., vol. 86, pp. 147–167, 2019, [Online]. Available: https://www.sciencedirect.com/science/article/abs/pii/S016740481930118X
P. Mishra, V. Varadharajan, U. Tupakula, and E. S. Pilli, “A detailed investigation and analysis of using machine learning techniques for intrusion detection,” IEEE Commun. Surv. Tutorials, vol. 21, no. 1, pp. 686–728, Jan. 2019, doi: 10.1109/COMST.2018.2847722.
T. T. Khoei, G. Aissou, W. C. Hu, and N. Kaabouch, “Ensemble Learning Methods for Anomaly Intrusion Detection System in Smart Grid,” IEEE Int. Conf. Electro Inf. Technol., vol. 2021-May, pp. 129–135, May 2021, doi: 10.1109/EIT51626.2021.9491891.
G. Apruzzese, M. Colajanni, L. Ferretti, A. Guido, and M. Marchetti, “On the effectiveness of machine and deep learning for cyber security,” Int. Conf. Cyber Conflict, CYCON, vol. 2018-May, pp. 371–389, Jul. 2018, doi: 10.23919/CYCON.2018.8405026.
Mohammed Rauf Ali Khan, Abdulaziz Y. Barnawi, “Lightweight Quantized XGBoost for Botnet Detection in Resource-Constrained IoT Networks,” IOT, vol. 6, no. 4, p. 70, 2025, doi: https://doi.org/10.3390/iot6040070.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 50sea

This work is licensed under a Creative Commons Attribution 4.0 International License.


















