Fine-tune BERT for Implicit Hate Speech Binary-Classification on Social Media Corpora
Keywords:
Implicit Hate Speech, BERT, Binary Classification, Social Media Analysis, Natural Language Processing, Transformer Models , Text ClassificationAbstract
Implicit hate speech is a challenge to automated content moderation systems as it is expressed using indirect, context-dependent, and culturally subtle expressions, as opposed to explicit hate speech. It is often expressed using sarcasm, stereotypes, coded language, and metaphors. This paper examines the effectiveness of contextual transformer models for detecting fine-grained implicit hate speech on social media platforms. A BERT-based classification model is designed using a unified dataset created by combining the Davidson Hate Speech, Jigsaw Toxic Comment, and Implicit Hate Speech datasets. A structured preprocessing workflow is used to handle class imbalance, annotation diversity, and domain shifts. The experimental results show that the proposed model achieves over 94% accuracy, along with high F1-score and recall values for the hate class. The results validate that token-level contextual features learned using self-attention effectively capture subtle linguistic patterns. In summary, this paper emphasizes the significance of contextual learning using transformer models for developing robust and scalable automated moderation systems.
References
Anchal Rawat, Santosh Kumar, Surender Singh Samant, “Hate speech detection in social media: Techniques, recent trends, and future challenges,” Wiley Interdiscip. Rev. Comput. Stat., 2024, [Online]. Available: https://wires.onlinelibrary.wiley.com/doi/abs/10.1002/wics.1648
Yejin Lee, Joonghyuk Hahn, Hyeseon Ahn, Yo-Sub Han, “AmpleHate: Amplifying the Attention for Versatile Implicit Hate Detection,” arXiv:2505.19528, 2025, [Online]. Available: https://arxiv.org/abs/2505.19528
Nicolás Benjamín Ocampo, Elena Cabrio, Serena Villata, “Unmasking the Hidden Meaning: Bridging Implicit and Explicit Hate Speech Embedding Representations,” Find. Assoc. Comput. Linguist. EMNLP, 2023, [Online]. Available: https://aclanthology.org/2023.findings-emnlp.441/
Nadia Mushtaq Gardazi, Ali Daud, Muhammad Kamran Malik, Amal Bukhari, Tariq Alsahfi, “BERT applications in natural language processing: a review,” Artif. Intell. Rev., vol. 58, no. 166, 2025, [Online]. Available: https://link.springer.com/article/10.1007/s10462-025-11162-5
K. P. Hao Zhuo, Yicheng Yang, “Combating Toxic Language: A Review of LLM-Based Strategies for Software Engineering,” arXiv:2504.15439, 2025, [Online]. Available: https://arxiv.org/abs/2504.15439
Endrit Fetahi, Arsim Susuri, Mentor Hamiti, Zenun Kastrati, Ercan Canhasi, “Enhancing social media hate speech detection in low-resource languages using transformers and explainable AI,” Soc. Netw. Anal. Min., vol. 15, no. 82, 2025, [Online]. Available: https://link.springer.com/article/10.1007/s13278-025-01497-w
Gil Ramos, Fernando Batista, Ricardo Ribeiro, Pedro Fialho, Sérgio Moro, António Fonseca, Rita Guerra, Paula Carvalho, Catarina Marques, “A comprehensive review on automatic hate speech detection in the age of the transformer,” Soc. Netw. Anal. Min., vol. 14, no. 204, 2024, [Online]. Available: https://link.springer.com/article/10.1007/s13278-024-01361-3
Arumugham Palaniammal, Purushothaman Anandababu, “Sarcasm detection on social data: heuristic search and deep learning,” IAES Int. J. Artif. Intell., vol. 13, no. 4, 2024, [Online]. Available: https://ijai.iaescore.com/index.php/IJAI/article/view/24944
Lihong Zhang, Muhammad Faseeh , Syed Shehryar Ali Naqvi, Liang Hu, Anwar Ghani, “Enhancing sarcasm detection on social media: A comprehensive study using LLMs and BERT with multi-headed attention on SARC,” Plosone, 2025, [Online]. Available: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0334120
Santosh Chapagain, Shah Muhammad Hamdi, Soukaina Filali Boubrahimi, “Advancing Hate Speech Detection with Transformers: Insights from the MetaHate,” arXiv:2508.04913, 2025, [Online]. Available: https://arxiv.org/abs/2508.04913
Seohyun Yoo, Eunbae Jeon, Joonseo Hyeon, Jaehyuk Cho, “Adaptive ensemble techniques leveraging BERT based models for multilingual hate speech detection in Korean and english,” Sci. Rep., vol. 19844, 2025, [Online]. Available: https://www.nature.com/articles/s41598-025-88960-y
Vassiliy Cheremetiev, Quang Long Ho Ngo, Chau Ying Kot, Alina Elena Baia, Andrea Cavallaro, “Specializing General-purpose LLM Embeddings for Implicit Hate Speech Detection across Datasets,” arXiv:2508.20750, 2025, [Online]. Available: https://arxiv.org/abs/2508.20750
“Hate Speech and Offensive Language Detection.” Accessed: Mar. 13, 2026. [Online]. Available: https://www.kaggle.com/datasets/thedevastator/hate-speech-and-offensive-language-detection
“jigsaw-toxic-comment-classification-challenge.” Accessed: Feb. 22, 2026. [Online]. Available: https://www.kaggle.com/datasets/julian3833/jigsaw-toxic-comment-classification-challenge
“GitHub - SALT-NLP/implicit-hate.” Accessed: Feb. 22, 2026. [Online]. Available: https://github.com/SALT-NLP/implicit-hate
Vitthal Bhandari, “On the Challenges of Building Datasets for Hate Speech Detection,” arXiv:2309.02912, vol. 9, 2023, [Online]. Available: https://arxiv.org/abs/2309.02912
Hind Saleh, Areej Alhothali, “Detection of Hate Speech using BERT and Hate Speech Word Embedding with Deep Model,” Appl. Artif. Intell., vol. 37, no. 1, 2023, [Online]. Available: https://www.tandfonline.com/doi/full/10.1080/08839514.2023.2166719
Jitendra Singh Malik, Hezhe Qiao, Guansong Pang, Anton van den Hengel, “Deep Learning for Hate Speech Detection: A Comparative Study,” Int. J. Data Sci. Anal., 2023, [Online]. Available: https://arxiv.org/abs/2202.09517
Mahmoud Abusaqer, Jamil Saquer, “Efficient Hate Speech Detection: Evaluating 38 Models from Traditional Methods to Transformers,” ACMSE 2025 - Proc. 2025 ACM Southeast Conf., pp. 203–214, 2025, [Online]. Available: https://dl.acm.org/doi/10.1145/3696673.3723061
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 50sea

This work is licensed under a Creative Commons Attribution 4.0 International License.


















