Network Traffic Classification in SDN Networks Using PCA Integrated Boosting Algorithms
Keywords:
Network, Classification, SDN, XGBM, LGBM, PCA, confusion matrixAbstract
In recent years, internet traffic has increased as a result of the introduction of new services and apps. As a result, managing network traffic has grown more challenging. To accomplish this, several classification techniques for network traffic were proposed. Several researchers have used the most advanced deep learning and machine learning models for the suggested challenge. The suggested work can also make use of boosting methods. Boosting algorithms take advantage of the decision tree idea. They take little training time, and model training does not require a powerful system. Thus, boosting algorithms like Extreme Gradient Boosting Model (XGBM), Light Gradient Boosting Model (LGBM), Cat Boost, and Ada Boost with the integration of Principle component analysis (PCA) are used in the proposed study to classify network traffic. The results of these models are compared in terms of confusion matrix, accuracy, precision, recall, and F-Measure. The Network traffic android malware dataset, which was utilized in the proposed study, is publicly accessible online on Kaggle.com. For simulation, Python and its libraries such as sci-kit-learn, tensor flow, keras, and matplotlib are utilized. Following the simulation, the results showed that the XGBM had 90.41% accuracy, 96.39% precision, 89.72% recall, and 92.91% f-measures, while the LGBM had 89.02% accuracy, 90.04% precision, 89.8% recall, and 89.83% f-measures. 86.87% accuracy, 83.97% recall, 89.43% precision, and 86.61% f-measure were attained with Cat Boost. Following that, ada boost obtained 83.07% accuracy, 80% recall rate, 85.25 precision, and 82.58% f-measures. After the integration of the proposed boosting algorithms with PCA, we achieved a very significant enhancement in results. After the integration, it has been achieved that the accuracy rate of XGBoost has improved to 95.56%, while the recall rate is 94.39%, precision is 96.72% and the F-Measure rate has improved to 93.91%. Similarly, the performance of the light Gbm model is also improved with the integration of PCA. It achieved an accuracy rate of 93.41%, precision of 93.72%, recall of 92.39%, and f-measures of 92.91%. Following this, the performance of PCA integrated cat boost could also be seen as improved, as it achieved an accuracy rate of 94.41%, precision rate of 93.72%, recall of 92.39%, and F-measures of 93.91%. Similarly, the performance of a boost has also gained improvement by achieving an accuracy rate of 94.56%, precision rate of 94.72%, recall of 93.39%, and F-measure score of 93.91%. After all the simulations and performance evaluations, it has been achieved that the integration of PCA with the boosting algorithm is a simple trick to improve the performance of boosting algorithms. As here the performance of each model is improved to approximately 10%.
References
Nuñez-Agurto, D., Fuertes, W., Marrone, L., Benavides-Astudillo, E., Coronel-Guerrero, C., & Perez, F. (2024). A novel traffic classification approach by employing deep learning on software-defined networking. Future Internet, 16(5), 153.
Belkadi, O., Vulpe, A., Laaziz, Y., & Halunga, S. (2023). Ml-based traffic classification in an sdn-enabled cloud environment. Electronics, 12(2), 269.
Ayoubi, S., Limam, N., Salahuddin, M. A., Shahriar, N., Boutaba, R., Estrada-Solano, F., & Caicedo, O. M. (2018). Machine learning for cognitive network management. IEEE Communications Magazine, 56(1), 158-165.
Clark, D. D., Partridge, C., Ramming, J. C., & Wroclawski, J. T. (2003, August). A knowledge plane for the internet. In Proceedings of the 2003 conference on Applications, technologies, architectures, and protocols for computer communications (pp. 3-10).
Trois, C., Del Fabro, M. D., de Bona, L. C., & Martinello, M. (2016). A survey on SDN programming languages: Toward a taxonomy. IEEE Communications Surveys & Tutorials, 18(4), 2687-2712.
Kreutz, D., Ramos, F. M., Verissimo, P. E., Rothenberg, C. E., Azodolmolky, S., & Uhlig, S. (2014). Software-defined networking: A comprehensive survey. Proceedings of the IEEE, 103(1), 14-76.
Xie, J., Yu, F. R., Huang, T., Xie, R., Liu, J., Wang, C., & Liu, Y. (2018). A survey of machine learning techniques applied to software defined networking (SDN): Research issues and challenges. IEEE Communications Surveys & Tutorials, 21(1), 393-430.
Moore, A. W., & Papagiannaki, K. (2005, March). Toward the accurate identification of network applications. In International workshop on passive and active network measurement (pp. 41-54). Berlin, Heidelberg: Springer Berlin Heidelberg.
Nguyen, T. T., & Armitage, G. (2008). A survey of techniques for internet traffic classification using machine learning. IEEE communications surveys & tutorials, 10(4), 56-76.
Finsterbusch, M., Richter, C., Rocha, E., Muller, J. A., & Hanssgen, K. (2013). A survey of payload-based traffic classification approaches. IEEE Communications Surveys & Tutorials, 16(2), 1135-1156.
Li, G., Dong, M., Ota, K., Wu, J., Li, J., & Ye, T. (2016, December). Deep packet inspection based application-aware traffic control for software defined networks. In 2016 IEEE Global Communications Conference (GLOBECOM) (pp. 1-6). IEEE.
Nunez-Agurto, D., Fuertes, W., Marrone, L., & Macas, M. (2022). Machine Learning-Based Traffic Classification in Software-Defined Networking: A Systematic Literature Review, Challenges, and Future Research Directions. IAENG International Journal of Computer Science, 49(4).
Fan, Z., & Liu, R. (2017, August). Investigation of machine learning based network traffic classification. In 2017 International Symposium on Wireless Communication Systems (ISWCS) (pp. 1-6). IEEE.
Tahaei, H., Afifi, F., Asemi, A., Zaki, F., & Anuar, N. B. (2020). The rise of traffic classification in IoT networks: A survey. Journal of Network and Computer Applications, 154, 102538.
Hayes, M., Ng, B., Pekar, A., & Seah, W. K. (2017). Scalable architecture for SDN traffic classification. IEEE Systems Journal, 12(4), 3203-3214.
Sun, W., Zhang, Y., Li, J., Sun, C., & Zhang, S. (2022). A deep learning-based encrypted VPN traffic classification method using packet block image. Electronics, 12(1), 115.
Zaki, F. A. M., & Chin, T. S. (2019). FWFS: Selecting robust features towards reliable and stable traffic classifier in SDN. IEEE Access, 7, 166011-166020.
Wang, P., Ye, F., Chen, X., & Qian, Y. (2018). Datanet: Deep learning based encrypted network traffic classification in sdn home gateway. IEEE Access, 6, 55380-55391.
Zhang, C., Wang, X., Li, F., He, Q., & Huang, M. (2018). Deep learning–based network application classification for SDN. Transactions on Emerging Telecommunications Technologies, 29(5), e3302.
Lim, H. K., Kim, J. B., Kim, K., Hong, Y. G., & Han, Y. H. (2019). Payload-based traffic classification using multi-layer lstm in software defined networks. Applied Sciences, 9(12), 2550.
Lin-Huang, C., Tsung-Han, L., Hung-Chi, C., & Cheng-Wei, S. (2020). Application-based online traffic classification with deep learning models on SDN networks. Advances in Technology Innovation, 5(4), 216.
Al-Fayoumi, M., Al-Fawa'reh, M., & Nashwan, S. (2022). VPN and Non-VPN Network Traffic Classification Using Time-Related Features. Computers, Materials & Continua, 72(2).
Chiu, K. C., Liu, C. C., & Chou, L. D. (2020). CAPC: Packet-based network service classifier with convolutional autoencoder. Ieee Access, 8, 218081-218094.
Wang, P., Wang, Z., Ye, F., & Chen, X. (2021). Bytesgan: A semi-supervised generative adversarial network for encrypted traffic classification of sdn edge gateway in green communication network. arXiv preprint arXiv:2103.05250.
Mohamed, S. A. A., & Kurnaz, S. (2023). Classified VPN Network Traffic Flow Using Time Related to Artificial Neural Network.
Setiawan, R., Ganga, R. R., Velayutham, P., Thangavel, K., Sharma, D. K., Rajan, R., ... & Sengan, S. (2022). Encrypted network traffic classification and resource allocation with deep learning in software defined network. Wireless Personal Communications, 1-17.
Ahn, S., Kim, J., Park, S. Y., & Cho, S. (2020). Explaining deep learning-based traffic classification using a genetic algorithm. IEEE Access, 9, 4738-4751.
Jang, Y., Kim, N., & Lee, B. D. (2023). Traffic classification using distributions of latent space in software-defined networks: An experimental evaluation. Engineering Applications of Artificial Intelligence, 119, 105736.
Singh, A. (4). boosting algorithms you should know–GBM, XGBoost, LightGBM, & Catboost. Consulted on, 13, 08-20.
Khan, A., Khan, A., Khan, M. M., Farid, K., Alam, M. M., & Su’ud, M. B. M. (2022). Cardiovascular and diabetes diseases classification using ensemble stacking classifiers with SVM as a meta classifier. Diagnostics, 12(11), 2595.
Lashari, S. A., Khan, M. M., Khan, A., Salahuddin, S., & Ata, M. N. (2024). Comparative Evaluation of Machine Learning Models for Mobile Phone Price Prediction: Assessing Accuracy, Robustness, and Generalization Performance. Journal of Informatics and Web Engineering, 3(3), 147-163.
Alhwaiti, Y., Khan, M., Asim, M., Siddiqi, M. H., Ishaq, M., & Alruwaili, M. (2025). Leveraging YOLO deep learning models to enhance plant disease identification. Scientific Reports, 15(1), 7969.
M. Ishaq, S. Zahir, L. Iftikhar, M. F. Bulbul, S. Rho and M. Y. Lee, "Machine Learning Based Missing Data Imputation in Categorical Datasets," in IEEE Access, vol. 12, pp. 88332-88344, 2024, doi: 10.1109/ACCESS.2024.3411817.

Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 50Sea

This work is licensed under a Creative Commons Attribution 4.0 International License.