Identification of Real and Fake Reviews Written in Roman Urdu

Asif Ahmed; Irfan Bacho; Shahnawaz Talpur

Authors

Asif Ahmed Dept of Computer Systems Engineering Mehran University of Engineering and Technology Jamshoro, Pakistan
Irfan Bacho Dept of Computer Systems Engineering Mehran University of Engineering and Technology Jamshoro, Pakistan
Shahnawaz Talpur Dept of Computer Systems Engineering Mehran University of Engineering and Technology Jamshoro, Pakistan

Keywords:

fake reviews, deep learning algorithms, e-commerce, support vector machine, comparative analysis

Abstract

The evolution of e-commerce has made reviews a crucial metric for judging the quality of online products or services. These reviews have a significant impact on the decision of the customer. Positive review catches more attraction while negative reviews impact sales of the product. Nowadays, deceptive reviews are being deliberately posted on e-commerce websites and social media stores to promote the product by illegal means. These reviews are sometimes posted in different local languages to build a fake virtual reputation among local customers. Thus, fake review detection is a wider area for ongoing research. This paper proposes several machine-learning approaches to detect fake reviews written in Roman Urdu. Furthermore, a comparative analysis of the performance of nine machine learning models on the given dataset is performed. The dataset is crawled from different e-commerce sites in Pakistan. The results show that the existing Support Vector Machine outperforms the rest of the models with an accuracy of 82%.

References

R. Xu, Y. Xia, K.-F. Wong, and W. Li, “Opinion Annotation in On-line Chinese Product Reviews.” 2008. Accessed: Dec. 25, 2023. [Online]. Available: http://www.lrec-conf.org/proceedings/lrec2008/pdf/415_paper.pdf

A. K. Samha, Y. Li, and J. Zhang, “Aspect-Based Opinion Extraction from Customer Reviews,” pp. 149–160, Apr. 2014, doi: 10.5121/csit.2014.4413.

I. Peñalver-Martinez et al., “Feature-based opinion mining through ontologies,” Expert Syst. Appl., vol. 41, no. 13, pp. 5995–6008, Oct. 2014, doi: 10.1016/J.ESWA.2014.03.022.

A. M. Popescu and O. Etzioni, “Extracting product features and opinions from reviews,” Nat. Lang. Process. Text Min., pp. 9–28, 2007, doi: 10.1007/978-1-84628-754-1_2/COVER.

M. Hu and B. Liu, “Mining and summarizing customer reviews,” KDD-2004 - Proc. Tenth ACM SIGKDD Int. Conf. Knowl. Discov. Data Min., pp. 168–177, 2004, doi: 10.1145/1014052.1014073.

N. N. Ho-Dac, S. J. Carson, and W. L. Moore, “The Effects of Positive and Negative Online Customer Reviews: Do Brand Strength and Category Maturity Matter?,” https://doi.org/10.1509/jm.11.0011, vol. 77, no. 6, pp. 37–53, Nov. 2013, doi: 10.1509/JM.11.0011.

F. Zhu and X. (Michael) Zhang, “Impact of Online Consumer Reviews on Sales: The Moderating Role of Product and Consumer Characteristics,” https://doi.org/10.1509/jm.74.2.133, vol. 74, no. 2, pp. 133–148, Mar. 2010, doi: 10.1509/JM.74.2.133.

J. Ye and L. Akoglu, “Discovering opinion spammer groups by network footprints,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 9284, pp. 267–282, 2015, doi: 10.1007/978-3-319-23528-8_17/COVER.

C. Jiang, X. Zhang, and A. Jin, “Detecting Online Fake Reviews via Hierarchical Neural Networks and Multivariate Features,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 12532 LNCS, pp. 730–742, 2020, doi: 10.1007/978-3-030-63830-6_61/COVER.

Y. Li, F. Wang, S. Zhang, and X. Niu, “Detection of Fake Reviews Using Group Model,” Mob. Networks Appl., vol. 26, no. 1, pp. 91–103, Feb. 2021, doi: 10.1007/S11036-020-01688-Z/METRICS.

R. Mohawesh, S. Tran, R. Ollington, and S. Xu, “Analysis of concept drift in fake reviews detection,” Expert Syst. Appl., vol. 169, p. 114318, May 2021, doi: 10.1016/J.ESWA.2020.114318.

G. Satia Budhi, R. Chiong, Z. Wang, and S. Dhakal, “Using a hybrid content-based and behavior-based featuring approach in a parallel environment to detect fake reviews,” Electron. Commer. Res. Appl., vol. 47, p. 101048, May 2021, doi: 10.1016/J.ELERAP.2021.101048.

F. Abri, L. F. Gutierrez, A. S. Namin, K. S. Jones, and D. R. W. Sears, “Linguistic Features for Detecting Fake Reviews,” Proc. - 19th IEEE Int. Conf. Mach. Learn. Appl. ICMLA 2020, pp. 352–359, Dec. 2020, doi: 10.1109/ICMLA51294.2020.00063.

H. Li, Z. Chen, B. Liu, X. Wei, and J. Shao, “Spotting Fake Reviews via Collective Positive-Unlabeled Learning,” Proc. - IEEE Int. Conf. Data Mining, ICDM, vol. 2015-January, no. January, pp. 899–904, Jan. 2014, doi: 10.1109/ICDM.2014.47.

A. Heydari, M. A. Tavakoli, N. Salim, and Z. Heydari, “Detection of review spam: A survey,” Expert Syst. Appl., vol. 42, no. 7, pp. 3634–3642, May 2015, doi: 10.1016/J.ESWA.2014.12.029.

M. Ott, Y. Choi, C. Cardie, and J. T. Hancock, “Finding Deceptive Opinion Spam by Any Stretch of the Imagination,” ACL-HLT 2011 - Proc. 49th Annu. Meet. Assoc. Comput. Linguist. Hum. Lang. Technol., vol. 1, pp. 309–319, Jul. 2011, Accessed: Dec. 25, 2023. [Online]. Available: https://arxiv.org/abs/1107.4557v1

S. Feng, R. Banerjee, and Y. Choi, “Syntactic Stylometry for Deception Detection.” Association for Computational Linguistics, pp. 171–175, 2012. Accessed: Dec. 25, 2023. [Online]. Available: https://aclanthology.org/P12-2034

E. Elmurngi and A. Gherbi, “An empirical study on detecting fake reviews using machine learning techniques,” 7th Int. Conf. Innov. Comput. Technol. INTECH 2017, pp. 107–114, Nov. 2017, doi: 10.1109/INTECH.2017.8102442.

V. K. Singh, R. Piryani, A. Uddin, and P. Waila, “Sentiment analysis of Movie reviews and Blog posts,” Proc. 2013 3rd IEEE Int. Adv. Comput. Conf. IACC 2013, pp. 893–898, 2013, doi: 10.1109/IADCC.2013.6514345.

A. Molla, Y. Biadgie, and K. A. Sohn, “Detecting negative deceptive opinion from tweets,” Lect. Notes Electr. Eng., vol. 425, pp. 329–339, 2018, doi: 10.1007/978-981-10-5281-1_36/COVER.

S. Shojaee, M. A. A. Murad, A. Bin Azman, N. M. Sharef, and S. Nadali, “Detecting deceptive reviews using lexical and syntactic features,” Int. Conf. Intell. Syst. Des. Appl. ISDA, pp. 53–58, Oct. 2014, doi: 10.1109/ISDA.2013.6920707.

G. G. Chowdhury, “Natural language processing,” Annu. Rev. Inf. Sci. Technol., vol. 37, no. 1, pp. 51–89, 2003.

C. Silva and B. Ribeiro, “The Importance of Stop Word Removal on Recall Values in Text Categorization,” Proc. Int. Jt. Conf. Neural Networks, vol. 3, pp. 1661–1666, 2003, doi: 10.1109/IJCNN.2003.1223656.

J. Plisson, N. Lavrač, and D. Mladenić, “A Rule-based Approach to Word Lemmatization,” 2004.

C. Lee and D. A. Landgrebe, “Feature Extraction Based on Decision Boundaries,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 15, no. 4, pp. 388–400, 1993, doi: 10.1109/34.206958.

N. Jindal and B. Liu, “Review spam detection,” 16th Int. World Wide Web Conf. WWW2007, pp. 1189–1190, 2007, doi: 10.1145/1242572.1242759.

R. Mihalcea, C. Corley, and C. Strapparava, “Corpus-based and Knowledge-based Measures of Text Semantic Similarity”, Accessed: Dec. 25, 2023. [Online]. Available: www.aaai.org

J. Ramos, “Using TF-IDF to Determine Word Relevance in Document Queries,” Proc. first Instr. Conf. Mach. Learn., vol. 242, no. 1, pp. 29–48.