Developing a Quranic QA System: Bridging Linguistic Gaps in Urdu Translation Using NLP and Transformer Model
Keywords:
Quranic Question-Answering (QA), Urdu Natural Language Processing (NLP), Transformer-based Models, RoBERTa; Semantic SearchAbstract
The limited access to Quranic knowledge for Urdu speakers is due to inadequate Natural Language Processing (NLP) tools, which hinder precise Quranic understanding and retrieval. This research introduces a Transformer-based Urdu Quranic Question-Answering (QA) system, a novel approach that enhances semantic accuracy and retrieval precision, unlike existing Arabic- and English-based models. This study primarily leverages Transformer-based technology to develop a context-aware Urdu Quranic chatbot, unlike conventional systems, which primarily support Arabic and English Quranic texts. The system addresses the missing linguistic gaps in Quranic QA by enhancing both precision and semantic interpretation for Urdu users. The system was trained using Fateh Muhammad Jalandhari’s Urdu Quranic translation and fine-tuned with Roberta for enhanced semantic text analysis. It integrates TF-IDF with SBERT for improved question-answering performance. The NLP system went through multiple evaluation metrics were used to assess its precision and overall capability. The chatbot achieved high retrieval accuracy with a Mean Average Precision of 0.85, an Exact Match of 0.82, and an F1 Score of 0.88. User satisfaction reached 92%, indicating its effectiveness in providing precise Quranic answers. Future updates will introduce that include voice detection features, expanded language support, and integration with Tafsir and Hadith databases for improved contextual understanding. This study enhances Urdu Quranic information retrieval by providing an improved NLP-based solution for automated Islamic knowledge dissemination.
References
T. E. Rana Malhas, Watheq Mansour, “Qur’an QA 2022: Overview of The First Shared Task on Question Answering over the Holy Qur’an,” Eur. Lang. Resour. Assoc., pp. 79–87, 2022, doi: 10.18653/v1/2023.arabicnlp-1.76.
S. S. M. Alnefaie, E. Atwell, and M. A. Alsalka, “Using Automatic Question Generation Web Services Tools to Build a Quran Question-and-Answer Dataset,” 2023, Accessed: Mar. 05, 2025. [Online]. Available: http://www.sign-ific-ance.co.uk/index.php/IJASAT/article/view/2591
M. ammar A. Menwa Alshammeri , Eric Atwell, “Detecting Semantic-based Similarity Between Verses of The Quran with Doc2vec,” Procedia Comput. Sci., vol. 189, pp. 351–358, 2021, doi: https://doi.org/10.1016/j.procs.2021.05.104.
S. M. A. Tahani N. Alruqi, “Evaluation of an Arabic Chatbot Based on Extractive Question-Answering Transfer Learning and Language Transformers,” AI, vol. 4, no. 3, pp. 667–691, 2023, doi: https://doi.org/10.3390/ai4030035.
A. A. Samee Arif, Abdul Hameed Azeemi, Agha Ali Raza, “Generalists vs. Specialists: Evaluating Large Language Models for Urdu,” Assoc. Comput. Linguist., pp. 7263–7280, 2024, doi: 10.18653/v1/2024.findings-emnlp.426.
A. A. Sarah Alnefaie, Abdullah Alsaleh, Eric Atwell, Mohammad Alsalka, “LK2022 at Qur’an QA 2022: simple transformers model for finding answers to questions from Qur’an,” Proc. OSACT 2022 Work., pp. 720–727, 2022, doi: 10.18653/v1/2023.arabicnlp-1.80.
A. S. Amna Zafar, Muhammad Wasim, Shaista Zulfiqar, Talha Waheed, “Transformer-Based Topic Modeling for Urdu Translations of the Holy Quran,” ACM Trans. Asian Low-Resource Lang. Inf. Process., vol. 23, no. 10, pp. 1–21, 2024, doi: https://doi.org/10.1145/3694967.
Ben Hutchinson, “Modeling the sacred: Considerations when using religious texts in NLP,” arXiv Prepr. arXiv2404.14740, 2024, [Online]. Available: https://arxiv.org/html/2404.14740v2#:~:text=This paper argues that NLP’s,marginalized linguistic and religious communities.
I. Ellabib, Ebtihal Alarabi, “Implementation of Qur’anic Question Answering System Based on the BERT Model,” Inf. Commun. Technol., pp. 173–183, 2024, doi: https://doi.org/10.1007/978-3-031-62624-1_14.
W. A. Muhammad Mujahid, Khadija Kanwal, Furqan Rustam, “Arabic ChatGPT Tweets Classification using RoBERTa and BERT Ensemble Model,” ACM Trans. Asian Low-Resource Lang. Inf. Process., vol. 22, no. 8, 2023, doi: 10.1145/3605889.
D. I. I. Muhammad Shahid Iqbal Malik , Uswa Cheema, “Contextual Embeddings based on Fine-tuned Urdu-BERT for Urdu threatening content and target identification,” J. King Saud Univ. - Comput. Inf. Sci., vol. 35, no. 7101606, 2023, doi: https://doi.org/10.1016/j.jksuci.2023.101606.
A. G. L. Khan, A. Amjad, N. Ashraf, H. -T. Chang, “Urdu Sentiment Analysis With Deep Learning Methods,” IEEE Access, vol. 9, pp. 97803–97812, 2021, doi: 10.1109/ACCESS.2021.3093078.
B. A. Alazzam, M. Alkhatib, and K. Shaalan, “Arabic Educational Neural Network Chatbot,” Inf. Sci. Lett., vol. 12, no. 6, pp. 2579–2589, 2023, doi: 10.18576/ISL/120654.
R. A. Undang Syaripudin, Deden Suparman, Yana Aditia Gerhana, Ayu Puji Rahayu, Mimin Mintarsih, “Chatbot for Signaling Quranic Verses Science Using Support Vector Machine Algorithm,” J. Online Inform., vol. 6, no. 2, 2021, doi: https://doi.org/10.15575/join.v6i2.827.

Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 50SEA

This work is licensed under a Creative Commons Attribution 4.0 International License.