The Customer Reviews Analysis Platform by Correlating Sentiment Analysis and Text Clustering
Text Clustering, K-means, Latent Dirichlet Allocation Algorithm, Sentiment Analysis, Customer Reviews FeedbackAbstract
Customer reviews and feedback are of paramount importance in the improvement cycle of any industry, product, or service. Formerly, product ratings were the basis for performance evaluation and key drivers of improvements. However, ratings were unable to depict the complete picture and were not adequate for an in-depth analysis of any product or service. Hence, customer reviews become the ultimate source of providing feedback for a specific detailed analysis as well as contributing to performance metrics. Although, customer reviews provide a very essential measure for performance evaluation, extracting important features and topics from customer reviews has been challenging due to its unlabeled and variant nature. This paper focuses on extracting topics from customer review data and bringing in use the of implicit knowledge for analytics. To extract topics and clusters from review data, unsupervised machine learning algorithms such as K-Means and Latent Dirichlet Allocation (LDA) are used. These topics are then correlated with sentiment analysis - score of positive or negative feedback - of each customer review. The products or services are then categorized with the help of the topics or domains they belong to alongside the sentiments. This provides a valuable analysis such as the score of positive, neutral, and negative feedback for each customer review input to new customers as well as product managers. This research work aims to use the hotel reviews dataset to categorize and rank hotels based on the different services captured in the text from customer reviews. The research work makes use of the hotel reviews dataset for categorizing and ranking hotels based on the different services discussed in the customer's reviews text. Moreover, this paper also provides a visualization of both text clustering algorithms depicting the topics in each cluster for an insightful analysis.
