The Customer Reviews Analysis Platform by Correlating Sentiment Analysis and Text Clustering

Authors

  • Ehtisham Ur Rehman University of Engineering and Technology, Peshawar
  • Najam Aziz
  • Nasir Ahmad

Keywords:

Text Clustering, K-means, Latent Dirichlet Allocation Algorithm, Sentiment Analysis, Customer Reviews Feedback

Abstract

Customer reviews and feedback are of paramount importance in the improvement cycle of any industry, product, or service. Formerly, product ratings were the basis for performance evaluation and key drivers of improvements. However, ratings were unable to depict the complete picture and were not adequate for an in-depth analysis of any product or service. Hence, customer reviews become the ultimate source of providing feedback for a specific detailed analysis as well as contributing to performance metrics. Although, customer reviews provide a very essential measure for performance evaluation, extracting important features and topics from customer reviews has been challenging due to its unlabeled and variant nature. This paper focuses on extracting topics from customer review data and bringing in use the of implicit knowledge for analytics. To extract topics and clusters from review data, unsupervised machine learning algorithms such as K-Means and Latent Dirichlet Allocation (LDA) are used. These topics are then correlated with sentiment analysis - score of positive or negative feedback - of each customer review. The products or services are then categorized with the help of the topics or domains they belong to alongside the sentiments. This provides a valuable analysis such as the score of positive, neutral, and negative feedback for each customer review input to new customers as well as product managers. This research work aims to use the hotel reviews dataset to categorize and rank hotels based on the different services captured in the text from customer reviews. The research work makes use of the hotel reviews dataset for categorizing and ranking hotels based on the different services discussed in the customer's reviews text. Moreover, this paper also provides a visualization of both text clustering algorithms depicting the topics in each cluster for an insightful analysis.

References

Z. Singla, S. Randhawa, and S. Jain, “Statistical and sentiment analysis of consumer product reviews,” 8th Int. Conf. Comput. Commun. Netw. Technol. ICCCNT 2017, Dec. 2017, doi: 10.1109/ICCCNT.2017.8203960.

K. L. Santhosh Kumar, J. Desai, and J. Majumdar, “Opinion mining and sentiment analysis on online customer review,” 2016 IEEE Int. Conf. Comput. Intell. Comput. Res. ICCIC 2016, May 2017, doi: 10.1109/ICCIC.2016.7919584.

H. Zhang, A. Sekhari, F. Fourli-Kartsouni, Y. Ouzrout, and A. Bouras, “Customer reviews analysis based on information extraction approaches,” IFIP Adv. Inf. Commun. Technol., vol. 467, pp. 227–237, 2016, doi: 10.1007/978-3-319-33111-9_21/TABLES/4.

Y. Woldemariam, “Sentiment analysis in a cross-media analysis framework,” Proc. 2016 IEEE Int. Conf. Big Data Anal. ICBDA 2016, Jul. 2016, doi: 10.1109/ICBDA.2016.7509790.

Y. Saito and V. Klyuev, “Classifying User Reviews at Sentence and Review Levels Utilizing Naïve Bayes,” Int. Conf. Adv. Commun. Technol. ICACT, vol. 2019-February, pp. 681–685, Apr. 2019, doi: 10.23919/ICACT.2019.8702039.

S. Shivashankar, S. P. Algur, and P. S. Hiremath, “Cluster Analysis of Customer Reviews Extracted from Web Pages,” J. Appl. Comput. Sci. Math., vol. 4, no. 9, pp. 56–62, Jan. 2010, Accessed: Jun. 08, 2024. [Online]. Available: https://doaj.org/article/73234c83e1d3441d96e355982b5a0eb0

“Text Analytics of Online Customer Reviews.” Accessed: Jun. 08, 2024. [Online]. Available: https://ecommons.cornell.edu/server/api/core/bitstreams/49fa121a-80cf-4126-bffc-ecde99faede0/content

A. S. . Lee, Z. Yusoff, Z. Zainol, and P. V, “Know your Hotels Well! an Online Review Analysis using Text Analytics,” Int. J. Eng. Technol., vol. 7, no. 4.31, pp. 341–347, Dec. 2018, doi: 10.14419/IJET.V7I4.31.23406.

X. Tian, W. He, R. Tao, and V. Akula, “Mining Online Hotel Reviews: A Case Study from Hotels in China”.

P. Porntrakoon and C. Moemeng, “Thai sentiment analysis for consumer’s review in multiple dimensions using sentiment compensation technique (SenSecomp),” ECTI-CON 2018 - 15th Int. Conf. Electr. Eng. Comput. Telecommun. Inf. Technol., pp. 25–28, Jul. 2018, doi: 10.1109/ECTICON.2018.8619892.

T. Iwata, T. Hirao, and N. Ueda, “Topic Models for Unsupervised Cluster Matching,” IEEE Trans. Knowl. Data Eng., vol. 30, no. 4, pp. 786–795, Apr. 2018, doi: 10.1109/TKDE.2017.2778720.

M. Allahyari and K. Kochut, “Discovering Coherent Topics with Entity Topic Models,” Proc. - 2016 IEEE/WIC/ACM Int. Conf. Web Intell. WI 2016, pp. 26–33, Jan. 2017, doi: 10.1109/WI.2016.0015.

B. Wang, Y. Liu, Z. Liu, M. Li, and M. Qi, “Topic selection in latent dirichlet allocation,” 2014 11th Int. Conf. Fuzzy Syst. Knowl. Discov. FSKD 2014, pp. 756–760, Dec. 2014, doi: 10.1109/FSKD.2014.6980931.

R. Bekkerman, R. El-Yaniv, Y. Winter, and N. Tishby, “On feature distributional clustering for text categorization,” SIGIR Forum (ACM Spec. Interes. Gr. Inf. Retrieval), pp. 146–153, 2001, doi: 10.1145/383952.383976.

“Mining Text Data.” Accessed: Jun. 08, 2024. [Online]. Available: https://sci-hub.se/10.1007/978-1-4614-3223-4

M. Alhawarat and M. Hegazi, “Revisiting K-Means and Topic Modeling, a Comparison Study to Cluster Arabic Documents,” IEEE Access, vol. 6, pp. 42740–42749, Jul. 2018, doi: 10.1109/ACCESS.2018.2852648.

F. Liu and L. Xiong, “Survey on text clustering algorithm: Research present situation of text clustering algorithm,” ICSESS 2011 - Proc. 2011 IEEE 2nd Int. Conf. Softw. Eng. Serv. Sci., pp. 901–904, 2011, doi: 10.1109/ICSESS.2011.5982485.

A. Sudha Ramkumar and R. Nethravathy, “TEXT DOCUMENT CLUSTERING USING K-MEANS ALGORITHM,” Int. Res. J. Eng. Technol., p. 1764, 2008, Accessed: Jun. 08, 2024. [Online]. Available: www.irjet.net

C. Xiong, Z. Hua, K. Lv, and X. Li, “An improved K-means text clustering algorithm by optimizing initial cluster centers,” Proc. - 2016 7th Int. Conf. Cloud Comput. Big Data, CCBD 2016, pp. 265–268, Jul. 2017, doi: 10.1109/CCBD.2016.059.

S. Lu et al., “Clustering method of raw meal composition based on PCA and kmeans,” Chinese Control Conf. CCC, vol. 2018-July, pp. 9007–9010, Oct. 2018, doi: 10.23919/CHICC.2018.8482823.

M. Bertin and I. Atanassova, “K-means and Hierarchical Clustering Method to Improve our Understanding of Citation Contexts”.

Y. Guo, S. J. Barnes, and Q. Jia, “Mining meaning from online ratings and reviews: Tourist satisfaction analysis using latent dirichlet allocation,” Tour. Manag., vol. 59, pp. 467–483, Apr. 2017, doi: 10.1016/J.TOURMAN.2016.09.009.

A. Kelaiaia and H. F. Merouani, “Clustering with Probabilistic Topic Models on Arabic Texts,” Stud. Comput. Intell., vol. 488, pp. 65–74, 2013, doi: 10.1007/978-3-319-00560-7_11.

J. Büschken and G. M. Allenby, “Sentence-Based Text Analysis for Customer Reviews,” https://doi.org/10.1287/mksc.2016.0993, vol. 35, no. 6, pp. 953–975, Jul. 2016, doi: 10.1287/MKSC.2016.0993.

D. M. Blei, A. Y. Ng, and J. B. Edu, “Latent Dirichlet Allocation Michael I. Jordan,” J. Mach. Learn. Res., vol. 3, pp. 993–1022, 2003.

C. Sievert and K. E. Shirley, “LDAvis: A method for visualizing and interpreting topics,” pp. 63–70, Jun. 2014, doi: 10.3115/V1/W14-3110.

D. Kozlowski, V. Semeshenko, and A. Molinari, “Latent Dirichlet allocation model for world trade analysis,” PLoS One, vol. 16, no. 2, p. e0245393, Feb. 2021, doi: 10.1371/JOURNAL.PONE.0245393.

“Sentiment Analysis on Political Tweets.” Accessed: Jun. 08, 2024. [Online]. Available: https://www.researchgate.net/publication/311986158_Sentiment_Analysis_on_Political_Tweets

W. Wang, “Sentiment analysis of online product reviews with semi-supervised topic sentiment mixture model,” Proc. - 2010 7th Int. Conf. Fuzzy Syst. Knowl. Discov. FSKD 2010, vol. 5, pp. 2385–2389, 2010, doi: 10.1109/FSKD.2010.5569528.

B. Saberi and S. Saad, “Sentiment analysis or opinion mining: A review,” Int. J. Adv. Sci. Eng. Inf. Technol., vol. 7, no. 5, pp. 1660–1666, 2017, doi: 10.18517/IJASEIT.7.5.2137.

L. M. Abualigah, A. T. Khader, and M. A. Al-Betar, “Unsupervised feature selection technique based on genetic algorithm for improving the Text Clustering,” Proc. - CSIT 2016 2016 7th Int. Conf. Comput. Sci. Inf. Technol., Aug. 2016, doi: 10.1109/CSIT.2016.7549453.

P. Han, S. Shen, D. Wang, and Y. Liu, “The influence of word normalization in English document clustering,” CSAE 2012 - Proceedings, 2012 IEEE Int. Conf. Comput. Sci. Autom. Eng., vol. 2, pp. 116–120, 2012, doi: 10.1109/CSAE.2012.6272740.

P. Bholowalia and A. Kumar, “EBK-Means: A Clustering Technique based on Elbow Method and K-Means in WSN,” Int. J. Comput. Appl., vol. 105, no. 9, pp. 17–24, 2014, doi: 10.5120/18405-9674.

S. Tripathi, A. Bhardwaj, and P. E, “Approaches to Clustering in Customer Segmentation,” Int. J. Eng. Technol., vol. 7, no. 3.12, pp. 802–807, Jul. 2018, doi: 10.14419/IJET.V7I3.12.16505.

V. Divya and K. N. Devi, “An Efficient Approach to Determine Number of Clusters Using Principal Component Analysis,” Proc. 2018 Int. Conf. Curr. Trends Towar. Converging Technol. ICCTCT 2018, Nov. 2018, doi: 10.1109/ICCTCT.2018.8551182.

H. Jelodar et al., “Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey,” Multimed. Tools Appl., vol. 78, no. 11, pp. 15169–15211, Jun. 2019, doi: 10.1007/S11042-018-6894-4/METRICS.

Z. J. Wang, D. Choi, S. Xu, and D. Yang, “Putting Humans in the Natural Language Processing Loop: A Survey,” Bridg. Human-Computer Interact. Nat. Lang. Process. HCINLP 2021 - Proc. 1st Work., pp. 47–52, Mar. 2021, Accessed: Jun. 08, 2024. [Online]. Available: https://arxiv.org/abs/2103.04044v1

D. Marutho, S. Hendra Handaka, E. Wijaya, and Muljono, “The Determination of Cluster Number at k-Mean Using Elbow Method and Purity Evaluation on Headline News,” Proc. - 2018 Int. Semin. Appl. Technol. Inf. Commun. Creat. Technol. Hum. Life, iSemantic 2018, pp. 533–538, Nov. 2018, doi: 10.1109/ISEMANTIC.2018.8549751.

“Silhouette Texture”, [Online]. Available: https://svbrdf.github.io/publications/siltex/siltex.pdf.

Downloads

Published

2024-06-03

How to Cite

Ehtisham Ur Rehman, Najam Aziz, N. A., & Nasir Ahmad, N. A. (2024). The Customer Reviews Analysis Platform by Correlating Sentiment Analysis and Text Clustering. International Journal of Innovations in Science & Technology, 6(5), 312–328. Retrieved from https://journal.50sea.com/index.php/IJIST/article/view/852