Abstractive Urdu Text Summarization Using Multilingual Transformer Models: A Deep Learning Approach
Keywords:
Abstractive Summarization, Urdu Articles, Deep Learning, Multilingual Transformer ModelsAbstract
In contrast to directly copying source text, abstractive text summarization produces short summaries through an understanding of the text. Urdu's low-resource language, which is also characterized by complexities, presents further obstacles. This text investigates the possible extent of deep learning models to automate Urdu text summarization. With respect to the general summary and particular attention to word choice, we try to address the challenges posed by the Urdu language, and we make use of deep learning models for a dataset of Urdu news articles to produce summaries that are accurate and coherent. BERTScore quantitative analysis reveals that the fine-tuned mBART model has an F1 score of 0.497, which is better than mT5 (0.355). As opposed to the most recent Urdu summarization research (2023-2025) in which the majority of reports include ROUGE-based scores, our methodology exhibits a superior semantic consistency and abstractiveness.
References
A. Raza, H. S. Raja, and U. Maratib, “Abstractive Summary Generation for the Urdu Language,” May 2023, Accessed: Dec. 30, 2025. [Online]. Available: http://arxiv.org/abs/2305.16195
M. H. S. Asif Raza, “Abstractive Text Summarization for Urdu Language | Journal of Computing & Biomedical Informatics,” Journal of Computing & Biomedical Informatics. Accessed: Dec. 30, 2025. [Online]. Available: https://jcbi.org/index.php/Main/article/view/596
N. Shafiq, I. Hamid, M. Asif, Q. Nawaz, H. Aljuaid, and H. Ali, “Abstractive text summarization of low- resourced languages using deep learning,” PeerJ Comput. Sci., vol. 9, p. e1176, Jan. 2023, doi: 10.7717/PEERJ-CS.1176/SUPP-2.
M. Awais and R. Muhammad Adeel Nawab, “Abstractive Text Summarization for the Urdu Language: Data and Methods,” IEEE Access, vol. 12, pp. 61198–61210, 2024, doi: 10.1109/ACCESS.2024.3378300.
H. Raza and W. Shahzad, “End to End Urdu Abstractive Text Summarization With Dataset and Improvement in Evaluation Metric,” IEEE Access, vol. 12, pp. 40311–40324, 2024, doi: 10.1109/ACCESS.2024.3377463.
S. Khan, I., Khalil, M. I. K., Nawaz, A., Khan, I. A., Zafar, L., & Ahmed, “Urdu Language Text Summarization using Machine Learning | Journal of Computing & Biomedical Informatics,” Journal of Computing & Biomedical Informatics.
M. S. T. Muhammad Ayaz, “The Impact of Technology on Pakistan’s Political Discourse: Integrating Islamic Values,” Tanazur Res. J., vol. 5, no. 3, 2024, [Online]. Available: https://tanazur.com.pk/index.php/tanazur/article/view/322
Simm, J., & Potts, S., “Multiclass Classification of German News Articles Using Convolutional Neural Networks,” Learn. Deep Textwork, 2021.
S. Kasim, A. Amjad, D. A. Dewi, and S. Kasim, “Evaluating Classical and Transformer-Based Models for Urdu Abstractive Text Summarization: A Systematic Review,” Jul. 2025, doi: 10.20944/PREPRINTS202507.1846.V1.
R. Aharoni, S. Narayan, J. Maynez, J. Herzig, E. Clark, and M. Lapata, “Multilingual Summarization with Factual Consistency Evaluation,” Proc. Annu. Meet. Assoc. Comput. Linguist., pp. 3562–3591, 2023, doi: 10.18653/V1/2023.FINDINGS-ACL.220.
A. Bhattacharjee, T. Hasan, W. U. Ahmad, Y. F. Li, Y. Bin Kang, and R. Shahriyar, “CrossSum: Beyond English-Centric Cross-Lingual Summarization for 1,500+ Language Pairs,” Proc. Annu. Meet. Assoc. Comput. Linguist., vol. 1, pp. 2541–2564, 2023, doi: 10.18653/V1/2023.ACL-LONG.143.
A. Scirè, K. Ghonim, and R. Navigli, “FENICE: Factuality Evaluation of summarization based on Natural language Inference and Claim Extraction,” Proc. Annu. Meet. Assoc. Comput. Linguist., pp. 14148–14161, 2024, doi: 10.18653/V1/2024.FINDINGS-ACL.841.
J. Z. Forde et al., “Re-Evaluating Evaluation for Multilingual Summarization,” EMNLP 2024 - 2024 Conf. Empir. Methods Nat. Lang. Process. Proc. Conf., pp. 19476–19493, 2024, doi: 10.18653/V1/2024.EMNLP-MAIN.1085.
Y. Ye et al., “GlobeSumm: A Challenging Benchmark Towards Unifying Multi-lingual, Cross-lingual and Multi-document News Summarization,” EMNLP 2024 - 2024 Conf. Empir. Methods Nat. Lang. Process. Proc. Conf., pp. 10803–10821, 2024, doi: 10.18653/V1/2024.EMNLP-MAIN.603.
R. Zhang, J. Ouni, and S. Eger, “Cross-lingual Cross-temporal Summarization: Dataset, Models, Evaluation,” Comput. Linguist., vol. 50, no. 3, pp. 1001–1047, Sep. 2024, doi: 10.1162/COLI_A_00519.
S. Mille et al., “The 2024 GEM Shared Task on Multilingual Data-to-Text Generation and Summarization: Overview and Preliminary Results,” INLG 2024 - 17th Int. Nat. Lang. Gener. Conf. Proc. Gener. Challenges, pp. 17–38, 2024, doi: 10.18653/V1/2024.INLG-GENCHAL.2.
D. Li et al., “From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge,” pp. 2757–2791, Nov. 2025, doi: 10.18653/V1/2025.EMNLP-MAIN.138.
Z. Luo, Q. Xie, and S. Ananiadou, “Factual consistency evaluation of summarization in the Era of large language models,” Expert Syst. Appl., vol. 254, p. 124456, Nov. 2024, doi: 10.1016/J.ESWA.2024.124456.
I. Mondshine, T. Paz-Argaman, and R. Tsarfaty, “Beyond N-Grams: Rethinking Evaluation Metrics and Strategies for Multilingual Abstractive Summarization,” Jul. 2025, Accessed: Dec. 30, 2025. [Online]. Available: https://arxiv.org/pdf/2507.08342
N. Dahan and G. Stanovsky, “The State and Fate of Summarization Datasets: A Survey,” pp. 7259–7278, Jun. 2025, doi: 10.18653/V1/2025.NAACL-LONG.372.
M. Azam et al., “Current Trends and Advances in Extractive Text Summarization: A Comprehensive Review,” IEEE Access, vol. 13, pp. 28150–28166, 2025, doi: 10.1109/ACCESS.2025.3538886.
M. Rodríguez-Ortega, M., Rodríguez-Lopez, E., Lima-López, S., Escolano, C., Melero, M., Pratesi, L., ... & Krallinger, “Overview of MultiClinSum task at BioASQ 2025: evaluation of clinical case summarization strategies for multiple languages: data, evaluation, resources and results,” CLEF 2025 Work. Notes, 2025.
N. Hussain, A. Qasim, G. Mehak, O. Kolesnikova, A. Gelbukh, and G. Sidorov, “Hybrid Machine Learning and Deep Learning Approaches for Insult Detection in Roman Urdu Text,” AI 2025, Vol. 6, Page 33, vol. 6, no. 2, p. 33, Feb. 2025, doi: 10.3390/AI6020033.
S. Ali, U. Jamil, M. Younas, B. Zafar, and M. Kashif Hanif, “Optimized Identification of Sentence-Level Multiclass Events on Urdu-Language-Text Using Machine Learning Techniques,” IEEE Access, vol. 13, pp. 1–25, 2025, doi: 10.1109/ACCESS.2024.3522992.
S. Haque, Z. Eberhart, A. Bansal, and C. McMillan, “Semantic Similarity Metrics for Evaluating Source Code Summarization,” IEEE Int. Conf. Progr. Compr., vol. 2022-March, pp. 36–47, Oct. 2022, doi: 10.1145/3524610.3527909;CSUBTYPE:STRING:CONFERENCE.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 50sea

This work is licensed under a Creative Commons Attribution 4.0 International License.


















