Securing Cloud Data: An Approach for Cloud Computing Data Categorization Based on Machine Learning
Keywords:
Random Forest, Naïve Bayes, Data classification, KNN, Cloud Computing, SVMAbstract
Introduction/Importance of Study: A novel innovative technique known methodical approach is referring as cloud computing (CC), which allows users to store data on remote servers that are accessible through the internet. This method makes it simple to move and retrieve vital and personal data storage. As a result, the demand for it is rising daily. This can be used to store a variety of data, including multimedia content, paperwork-based files, and financial transactions. Furthermore, by lowering operating and maintenance expenses, CC lessens the reliance of the services on local storage.
Novelty statement: Current systems apply only one key size with which all data is encrypted without concerning the level of privacy of the data. This results in higher processing costs and longer processing times. Furthermore, none of these methods improves secrecy and only achieves a low accuracy rate in data classification.
Material and Method: This study presents a cloud computing strategy for data sensitivity that is based on automated data classification. The model suggested in this study utilizes Random Forest (RF), Naïve Bayes (NB), k-nearest neighbor (KNN), and support vector machine (SVM) classifiers to achieve automated feature extraction. This methodology is designed to operate effectively across three sensitivity levels: basic, confidential, and highly confidential.
Results and Discussion: The experiments were performed on the Reuters-21578 dataset, which consists of 21,578 documents. The simulation results demonstrated that the three proposed models achieved accuracy rates of 97%, 96%, and 95%, respectively. These findings indicate that SVM, RF, and KNN outperform NB in classification performance.
Concluding Remarks: Additionally, the suggested study offers helpful recommendations for researchers and cloud service providers (like Dropbox and Google Drive).
References
N. Antonopoulos and L. Gillam, Eds., “Cloud Computing,” 2017, doi: 10.1007/978-3-319-54645-2.
T. H. Noor, S. Zeadally, A. Alfazi, and Q. Z. Sheng, “Mobile cloud computing: Challenges and future research directions,” J. Netw. Comput. Appl., vol. 115, pp. 70–85, Aug. 2018, doi: 10.1016/J.JNCA.2018.04.018.
H. J. xiaocui sun, Zhijun Wang, Yunxiang Wu, Hao Che, “A Price-Aware Congestion Control Protocol for Cloud Services,” J. Cloud Comput., 2021, doi: https://doi.org/10.21203/rs.3.rs-364078/v1.
D. Song, E. Shi, I. Fischer, and U. Shankar, “Cloud data protection for the masses,” Computer (Long. Beach. Calif)., vol. 45, no. 1, pp. 39–45, Jan. 2012, doi: 10.1109/MC.2012.1.
P. A. Amro Al-Said Ahmad, “Scalability resilience framework using application-level fault injection for cloud-based software services,” J. Cloud Comput., vol. 11, no. 1, 2022, doi: Journal of Cloud Computing.
N. Aljedani, R. Alotaibi, and M. Taileb, “HMATC: Hierarchical multi-label Arabic text classification model using machine learning,” Egypt. Informatics J., vol. 22, no. 3, pp. 225–237, 2021, doi: https://doi.org/10.1016/j.eij.2020.08.004.
R. B. Fang Liu, Jin Tong, Jian Mao and L. B. and D. L. John Messina, “NIST Cloud Computing Reference Architecture,” Natl. Inst. Stand. Technol., pp. 500–292, 2011, [Online]. Available: https://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication500-292.pdf
Q. Z. Zhiying Jiang, Bo Gao, Yanlin He, Yongming Han, Paul Doyle, “Text Classification Using Novel Term Weighting Scheme-Based Improved TF-IDF for Internet Media Reports,” Math. Probl. Eng., 2021, doi: https://doi.org/10.1155/2021/6619088.
“(PDF) Efficient Machine Learning Classifiers for Automatic Information Classification.” Accessed: Feb. 03, 2025. [Online]. Available: https://www.researchgate.net/publication/339551576_Efficient_Machine_Learning_Classifiers_for_Automatic_Information_Classification
U. S. K. L. M. Mundra, “Implementing digital signature with RSA encryption algorithm to enhance the Data Security of cloud in Cloud Computing,” 2010 First Int. Conf. Parallel, Distrib. Grid Comput., pp. 211–216, 2010, doi: 10.1109/PDGC.2010.5679895.
M. P. Rewagad and M. Y. Pawar, “Use of digital signature with diffie hellman key exchange and aes encryption algorithm to enhance data security in cloud computing,” Proc. - 2013 Int. Conf. Commun. Syst. Netw. Technol. CSNT 2013, pp. 437–439, 2013, doi: 10.1109/CSNT.2013.97.
P. Kanagala and R. Jayaraman, “Effective encryption approach to improving the secure cloud framework through fuzzy-based encrypted cryptography,” Soft Comput., pp. 1–10, Apr. 2023, doi: 10.1007/S00500-023-08188-8/METRICS.
P. Singh, B. Acharya, and R. K. Chaurasiya, “A comparative survey on lightweight block ciphers for resource constrained applications,” Int. J. High Perform. Syst. Archit., vol. 8, no. 4, pp. 250–270, 2019, doi: 10.1504/IJHPSA.2019.104953.
S. Hussain, T. Shah, and A. Javeed, “Modified advanced encryption standard (MAES) based on non-associative inverse property loop,” Multimed. Tools Appl., vol. 82, no. 11, pp. 16237–16256, May 2023, doi: 10.1007/S11042-022-14064-8/METRICS.
N. Sinha and L. Khreisat, “Cloud computing security, data, and performance issues,” 2014 23rd Wirel. Opt. Commun. Conf. WOCC 2014, 2014, doi: 10.1109/WOCC.2014.6839924.
Jagriti Dhamija, “Cloud Security Solutions: Comparison among Various Cryptographic Algorithms,” Int. J. Nov. Res. Dev., vol. 3, no. 4, 2018, [Online]. Available: https://www.ijnrd.org/papers/IJNRD1804025.pdf
G. Manik, S. Kalia, S. K. Sahoo, T. K. Sharma, and O. P. Verma, Eds., “Advances in Mechanical Engineering,” 2021, doi: 10.1007/978-981-16-0942-8.
S. Ahmad and S. Mehfuz, “Efficient time-oriented latency-based secure data encryption for cloud storage,” Cyber Secur. Appl., vol. 2, p. 100027, 2024, doi: https://doi.org/10.1016/j.csa.2023.100027.
M. Y. S. M. I. K. M. S. S. A. M. Zhu, “Dynamic AES Encryption and Blockchain Key Management: A Novel Solution for Cloud Data Security,” IEEE Access, vol. 12, pp. 26334–26343, 2024, doi: 10.1109/ACCESS.2024.3351119.
M. A. Zardari, L. T. Jung, and N. Zakaria, “K-NN classifier for data confidentiality in cloud computing,” 2014 Int. Conf. Comput. Inf. Sci. ICCOINS 2014 - A Conf. World Eng. Sci. Technol. Congr. ESTCON 2014 - Proc., Jul. 2014, doi: 10.1109/ICCOINS.2014.6868432.
K. A. Sayar Ul Hassan, Jameel Ahamed, “Analytics of machine learning-based algorithms for text classification,” Sustain. Oper. Comput., vol. 3, pp. 238–248, 2022, doi: https://doi.org/10.1016/j.susoc.2022.03.001.
P. Yellamma, C. Narasimham, and V. Sreenivas, “Data security in cloud using RSA,” 2013 4th Int. Conf. Comput. Commun. Netw. Technol. ICCCNT 2013, 2013, doi: 10.1109/ICCCNT.2013.6726471.
X. Luo, “Efficient English text classification using selected Machine Learning Techniques,” Alexandria Eng. J., vol. 60, no. 3, pp. 3401–3409, 2021, doi: https://doi.org/10.1016/j.aej.2021.02.009.
S. S. Sehra, “A REVIEW PAPER ON ALGORITHMS USED FOR TEXT CLASSIFICATION,” 2013.
V. T. Emmanouil K. Ikonomakis, Sotiris Kotsiantis, “Text Classification Using Machine Learning Techniques,” WSEAS Trans. Comput., vol. 4, no. 8, pp. 966–974, 2005, [Online]. Available: https://www.researchgate.net/publication/228084521_Text_Classification_Using_Machine_Learning_Techniques
A. I. Anik, S. Yeaser, A. G. M. Imam Hossain, and A. Chakrabarty, “Player’s performance prediction in ODI cricket using machine learning algorithms,” 4th Int. Conf. Electr. Eng. Inf. Commun. Technol. iCEEiCT 2018, pp. 500–505, Jul. 2018, doi: 10.1109/CEEICT.2018.8628118.
N. Kamal, M. Andrew, and M. Tom, “Semi-Supervised Text Classification Using EM,” Semi-Supervised Learn., pp. 32–55, Oct. 2006, doi: 10.7551/MITPRESS/9780262033589.003.0003.
I. Rasheed, V. Gupta, H. Banka, and C. Kumar, “Urdu text classification: A comparative study using machine learning techniques,” 2018 13th Int. Conf. Digit. Inf. Manag. ICDIM 2018, pp. 274–278, Sep. 2018, doi: 10.1109/ICDIM.2018.8847044.
Y. Zhan, H. Chen, S. F. Zhang, and M. Zheng, “Chinese text categorization study based on feature weight learning,” Proc. 2009 Int. Conf. Mach. Learn. Cybern., vol. 3, pp. 1723–1726, 2009, doi: 10.1109/ICMLC.2009.5212257.
B. P. Mayor Shweta, “Document Classification Using Support Vector Machine,” Int. J. Eng. Sci. Technol., vol. 4, no. 4, 2012, [Online]. Available: https://www.researchgate.net/publication/266593700_Document_Classification_Using_Support_Vector_Machine
Y. Zheng, “An exploration on text classification with classical machine learning algorithm,” Proc. - 2019 Int. Conf. Mach. Learn. Big Data Bus. Intell. MLBDBI 2019, pp. 81–85, Nov. 2019, doi: 10.1109/MLBDBI48998.2019.00023.
“(PDF) Techniques for text classification: Literature review and current trends.” Accessed: Feb. 03, 2025. [Online]. Available: https://www.researchgate.net/publication/301633216_Techniques_for_text_classification_Literature_review_and_current_trends
R. K. Tamanna, “Secure Cloud Model using Classification and Cryptography,” Int. J. Comput. Appl., vol. 159, no. 6, 2017, doi: 10.5120/ijca2017912953.
B. T. P. Quang Hung Nguyen, Hai-Bang Ly, Lanh Si Ho, Nadhir Al-Ansari, Hiep Van Le, Van Quan Tran, Indra Prakash, “Influence of Data Splitting on Performance of Machine Learning Models in Prediction of Shear Strength of Soil,” Math. Probl. Eng., 2021, doi: https://doi.org/10.1155/2021/4832864.
L. Morse, M. Teodorescu, Y. Awwad, and G. C. Kane, “A Framework for Fairer Machine Learning in Organizations,” SSRN Electron. J., Sep. 2020, doi: 10.2139/SSRN.3690570.
S. C. Jaeyoung Kim, Sion Jang, Eunjeong Park, “Text classification using capsules,” Neurocomputing, vol. 376, no. 1, pp. 214–221, 2020, doi: https://doi.org/10.1016/j.neucom.2019.10.033.
J. Y. R. Cornejo and H. Pedrini, “Audio-visual emotion recognition using a hybrid deep convolutional neural network based on census transform,” Conf. Proc. - IEEE Int. Conf. Syst. Man Cybern., vol. 2019-October, pp. 3396–3402, Oct. 2019, doi: 10.1109/SMC.2019.8914193.
K. Akuthota, A. Ganesh, B. Reddy A, and S. K. Depuru, “Machine Learning Models for Classification of Sensitive Financial Documents,” 5th IEEE Int. Conf. Cybern. Cogn. Mach. Learn. Appl. ICCCMLA 2023, pp. 334–340, 2023, doi: 10.1109/ICCCMLA58983.2023.10346685.
C. E. B. M.A. Friedl, “Decision tree classification of land cover from remotely sensed data,” Remote Sens. Environ., vol. 61, no. 3, pp. 399–409, 1997, doi: https://doi.org/10.1016/S0034-4257(97)00049-7.
Ekta, “MACHINE LEARNING: A REVIEW OF LEARNING TYPES,” Int. Res. J. Mod. Eng. Technol. Sci., vol. 4, no. 9, 2022, [Online]. Available: https://www.irjmets.com/uploadedfiles/paper//issue_9_september_2022/29824/final/fin_irjmets1662994184.pdf
X. Yan, L. Tan, H. Xu, and W. Qi, “Improved mixture differential attacks on 6-round AES-like ciphers towards time and data complexities,” J. Inf. Secur. Appl., vol. 80, p. 103661, Feb. 2024, doi: 10.1016/J.JISA.2023.103661.
M. A. R. Pandu Adam, “IMPLEMENTASI SISTEM KEAMANAN DOKUMEN KEPEGAWAIAN MENGGUNAKAN METODE AES-256 DAN VIGENERE CHIPER,” J. Komput. dan Teknol., vol. 3, no. 1, 2024, doi: https://doi.org/10.58290/jukomtek.v2i2.166.
M. A. R. Fathur Setya Pratama, “PENGAMANAN DOKUMEN KEPEGAWAIAN PADA DINAS PENDIDIKAN TEMANGGUNG DENGAN ALGORITMA RC4 DAN AES-256,” J. Komput. dan Teknol., vol. 3, no. 1, 2024, doi: https://doi.org/10.58290/jukomtek.v2i2.167.
and S. H. A. Sami, Teba Mohammed Ghazi, Subhi RM Zeebaree, “A Novel Multi-Level Hashing Algorithm to Enhance Internet of Things Devices’ and Networks’ Security,” Int. J. Intell. Syst. Appl. Eng., vol. 12, pp. 676–696, 2024, [Online]. Available: https://www.academia.edu/114844058/A_Novel_Multi_Level_Hashing_Algorithm_to_Enhance_Internet_of_Things_Devices_and_Networks_Security
S. Sangheethaa, A. Korath, and C. R. Ranjana, “Improvisation in SHA Algorithm,” RASSE 2023 - IEEE Int. Conf. Recent Adv. Syst. Sci. Eng. Proc., 2023, doi: 10.1109/RASSE60029.2023.10363491.
A. S. Babu, Ratnam Dodda, “Text Document Clustering Using Modified Particle Swarm Optimization with k-means Model,” Int. J. Artif. Intell. Tools, vol. 33, no. 1, 2024, doi: https://doi.org/10.1142/S0218213023500616.
C. J. I. H., Frank, E., Hall, M. A., & Pal, “Practical machine learning tools and techniques,” Witten, 2016.
D. G. Verma, Tanu, Renu Renu, “Tokenization and Filtering Process in RapidMiner,” Int. J. Appl. Inf. Syst., vol. 7, no. 2, 2014, [Online]. Available: https://research.ijais.org/volume7/number2/ijais14-451139.pdf
C. C. Aggarwal, “Machine Learning for Text: An Introduction,” Mach. Learn. Text, pp. 1–16, 2018, doi: 10.1007/978-3-319-73531-3_1.
S. M. Gaurav Gupta, “Text Document Tokenization for Word Frequency Count using Rapid Miner (Taking Resume as an Example),” Int. J. Comput. Appl, 2015, [Online]. Available: https://www.researchgate.net/publication/339527155_Text_Document_Tokenization_for_Word_Frequency_Count_using_Rapid_Miner_Taking_Resume_as_an_Example
H. Saif, M. Fernández, Y. He, and H. Alani, “On Stopwords, Filtering and Data Sparsity for Sentiment Analysis of Twitter,” Int. Conf. Lang. Resour. Eval., 2014.
K. Spirovski, E. Stevanoska, A. Kulakov, Z. Popeska, and G. Velinov, “Comparison of different model⇔s performances in task of document classification,” ACM Int. Conf. Proceeding Ser., Jun. 2018, doi: 10.1145/3227609.3227668.
J. Singh and V. Gupta, “Text Stemming,” ACM Comput. Surv., vol. 49, no. 3, Sep. 2016, doi: 10.1145/2975608.
G. Sampson and P. M. Postal, “The ‘language instinct’ debate : revised edition,” 2009, Accessed: Feb. 03, 2025. [Online]. Available: https://books.google.com/books/about/The_Language_Instinct_Debate.html?id=WkRDgytEWNYC
Richard F. Xiang, “Use of n-grams and K-means clustering to classify data from free text bone marrow reports,” J. Pathol. Inform., vol. 15, p. 100358, 2024, doi: https://doi.org/10.1016/j.jpi.2023.100358.
Y. R. Bowen Deng, Xinxing Liu, Wenxia Zhang, Juan Huang, “Chemoconnectomics: Mapping Chemical Transmission in Drosophila,” Neuron, vol. 101, no. 5, pp. 876–893, 2019, [Online]. Available: https://www.cell.com/neuron/fulltext/S0896-6273(19)30072-8?_returnURL=https%3A%2F%2Flinkinghub.elsevier.com%2Fretrieve%2Fpii%2FS0896627319300728%3Fshowall%3Dtrue
D. T. N. Shanthi, “A modified multi objective heuristic for effective feature selection in text classification,” Cluster Comput., vol. 22, pp. 10625–10635, 2019, doi: https://doi.org/10.1007/s10586-017-1150-7.
K. Chen, Z. Zhang, J. Long, and H. Zhang, “Turning from TF-IDF to TF-IGM for term weighting in text classification,” Expert Syst. Appl., vol. 66, pp. 1339–1351, Dec. 2016, doi: 10.1016/J.ESWA.2016.09.009.
P. C. Miftahul Qorib, Timothy Oladunni , Max Denis, Esther Ososanya, “Covid-19 vaccine hesitancy: Text mining, sentiment analysis and machine learning on COVID-19 vaccination Twitter dataset,” Expert Syst. Appl., vol. 212, p. 118715, 2023, doi: https://doi.org/10.1016/j.eswa.2022.118715.
S. Misra, Kousik Barik, “Analysis of customer reviews with an improved VADER lexicon classifier,” J. Big Data, vol. 11, no. 10, 2024, doi: https://doi.org/10.1186/s40537-023-00861-x.
H. Chen et al., “Pre-trained image processing transformer,” Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., pp. 12294–12305, 2021, doi: 10.1109/CVPR46437.2021.01212.
M. A. Aqsa Khalid, Ghulam Mustafa, Muhammad Rizwan Rashid Rana, Saeed M. Alshahrani, “RNN-BiLSTM-CRF based amalgamated deep learning model for electricity theft detection to secure smart grids,” PeerJ Comput. Sci., 2024, [Online]. Available: https://peerj.com/articles/cs-1872/
T. P. Latchoumi and L. Parthiban, “Quasi Oppositional Dragonfly Algorithm for Load Balancing in Cloud Computing Environment,” Wirel. Pers. Commun. 2021 1223, vol. 122, no. 3, pp. 2639–2656, Aug. 2021, doi: 10.1007/S11277-021-09022-W.
A. A. Tahani Alsaedi, Muhammad Rizwan Rashid Rana, Asif Nawaz, Ammar Raza, “Sentiment Mining in E-Commerce: The Transformer-based Deep Learning Model,” Int. J. Electr. Comput. Eng. Syst., vol. 15, no. 8, 2024, doi: https://doi.org/10.32985/ijeces.15.8.2.
A. I. Kadhim, “An evaluation of preprocessing techniques for text classification,” Int. J. Comput. Sci. Inf. Secur, vol. 16, no. 16, pp. 22–32, 2018.
N. I. M. and H. A. G. Mochamad Alfan Rosid, Arif Senja Fitrani, Ika Ratna Indra Astutik, “Improving Text Preprocessing For Student Complaint Document Classification Using Sastrawi,” IOP Conf. Ser. Mater. Sci. Eng., vol. 874, 2019, doi: 10.1088/1757-899X/874/1/012017.
“(PDF) Urdu Text Classification using Majority Voting.” Accessed: Feb. 03, 2025. [Online]. Available: https://www.researchgate.net/publication/307539554_Urdu_Text_Classification_using_Majority_Voting
M. Bilal, H. Israr, M. Shahid, and A. Khan, “Sentiment classification of Roman-Urdu opinions using Naïve Bayesian, Decision Tree and KNN classification techniques,” J. King Saud Univ. - Comput. Inf. Sci., vol. 28, no. 3, pp. 330–344, 2016, doi: https://doi.org/10.1016/j.jksuci.2015.11.003.
S. S. A. Balinsky, H. Balinsky, “Rapid Change Detection and Text Mining,” Proc. 2nd Conf. Math. Def. (IMA), Def. Acad. UK, 2011, [Online]. Available: https://ima.org.uk/wp/wp-content/uploads/2011/10/Rapid-Change-Detection-and-Text-Mining.pdf

Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 50SEA

This work is licensed under a Creative Commons Attribution 4.0 International License.