Experimental Evaluation of Related Papers Finding Techniques

Authors

  • Muhammad Ammad Idrees Dept. of Artificial Intelligence Capital University of Science and Technology
  • M. Abdul Qadir Dept. of Computer Science Capital University of Science and Technology
  • Maryam Nageen Dept. of Artificial Intelligence Capital University of Science and Technology

Keywords:

Paper Recommendation, Bibliometric Technique, Semantic Similarity, Jensen-Shannon Divergence, Citation-Based Technique

Abstract

Introduction/Importance of Study: Related-paper recommendation systems generally contain two categories: Content-Based (CB) approaches, which estimate relatedness using semantic similarity between research paper texts, and metadata-based approaches, which infer relatedness from bibliographic information such as citations, references, authorship, and publication venue. Although CB methods, such as Jensen–Shannon Divergence (JSD), computed over TF–IDF representations provide accurate relatedness scores, they are computationally expensive because they require processing of the full text of each paper. Metadata-based methods offer a more efficient substitute, but their relative effectiveness to strong CB measures remains unclear. This study investigates which bibliometric technique correlates most strongly with JSD-based semantic relatedness to identify a low-cost substitute for computationally expensive CB methods.

Novelty Statement: Since no existing dataset contained the required combination of full text, citations, references, and “related papers” lists, we constructed a new dataset of 1,225 papers, selected to statistically represent the population for a target keyword at 95% confidence with ±2.8% margin of error. No prior research has analyzed bibliometric methods using a unified dataset.

Material and Method: JSD-based relatedness scores were computed using full-text TF–IDF representations for all papers. We then calculated bibliographic relatedness using bibliographic coupling (BC), co-citation coupling (CC), and Katz similarity, and also extracted relatedness scores from Semantic Scholar (SS).

Result and Discussion: Correlation analysis revealed the following Pearson correlations with JSD: BC = 0.40, SS = 0.35, Katz = 0.01, CC = –0.11. These results indicate that BC-based relatedness aligns most closely with CB semantic similarity, followed by SS, while Katz and CC show negligible or negative correlation. Notably, the finding that Semantic Scholar’s related-paper measure correlates less strongly with JSD than bibliographic coupling is both surprising and practically important. 

Concluding Remarks: Overall, the results highlight the potential of BC-based methods as an efficient and reliable alternative to traditional full-text similarity computations for estimating relatedness.

References

Yicong Liang, Lap Kei Lee, “A Systematic Review of Citation Recommendation Over the Past Two Decades,” Int. J. Semant. Web Inf. Syst., vol. 19, no. 1, 2023, doi: https://doi.org/10.4018/IJSWIS.324071.

Iratxe Pinedo, Mikel Larrañaga, Ana Arruarte, “Recent Advances and Trends in Research Paper Recommender Systems: A Comprehensive Survey,” arXiv:2508.08828, 2026, [Online]. Available: https://arxiv.org/abs/2508.08828

Anton Klarin, “How to conduct a bibliometric content analysis: Guidelines and contributions of content co-occurrence or co-word literature reviews,” Int. J. Consum. Stud., 2024, [Online]. Available: https://onlinelibrary.wiley.com/doi/10.1111/ijcs.13031

M. Coşkun, A. Baggag, and M. Koyutürk, “Fast computation of Katz index for efficient processing of link prediction queries,” Data Min. Knowl. Discov. 2021 354, vol. 35, no. 4, pp. 1342–1368, Apr. 2021, doi: 10.1007/S10618-021-00754-8.

Luc Phan Tan, “Bibliometrics of social entrepreneurship research: Cocitation and bibliographic coupling analyses,” Cogent Bus. Manag., vol. 9, no. 1, 2022, doi: https://doi.org/10.1080/23311975.2022.2124594.

Guoxiu He, Aixin Sun, Wei Lu, “Research Explosion: More Effort to Climb onto Shoulders of the Giant,” arXiv:2307.06506, 2023, [Online]. Available: https://arxiv.org/abs/2307.06506

Herimanto Herimanto, Kevin Samosir, “A Comparative Analysis of Content-Based Filtering and TF-IDF Approaches for Enhancing Sports Recommendation Systems,” Innov. Res. Informatics, vol. 6, no. 2, pp. 90–97, 2024, doi: 10.37058/innovatics.v6i2.12404.

Shadikur Rahman, Hasibul Karim Shanto, Umme Ayman Koana, Syed Muhammad Danish, “Automated Research Article Classification and Recommendation Using NLP and ML,” arXiv:2510.05495, 2025, [Online]. Available: https://arxiv.org/abs/2510.05495

Rodney Kinney, Chloe Anastasiades, Russell Authur, “The Semantic Scholar Open Data Platform,” arXiv:2301.10140, 2025, [Online]. Available: https://arxiv.org/abs/2301.10140

C. Jeong, S. Jang, E. Park, and S. Choi, “A context-aware citation recommendation model with BERT and graph convolutional networks,” Sci. 2020 1243, vol. 124, no. 3, pp. 1907–1922, Jul. 2020, doi: 10.1007/S11192-020-03561-Y.

“MIReAD: Simple Method for Learning High-quality Representations from Scientific Documents - ACL Anthology.” Accessed: Apr. 30, 2026. [Online]. Available: https://aclanthology.org/2023.acl-short.46/

G. Mustafa, M. Usman, M. T. Afzal, A. Shahid and A. Koubaa, “A Comprehensive Evaluation of Metadata-Based Features to Classify Research Paper’s Topics,” IEEE Access, vol. 9, 2021, doi: 10.1109/ACCESS.2021.3115148.

Srishti Palani, Aakanksha Naik, Doug Downey, Amy X. Zhang, Jonathan Bragg, Joseph Chee Chang, “Relatedly: Scaffolding Literature Reviews with Existing Related Work Sections,” arXiv:2302.06754, 2023, [Online]. Available: https://arxiv.org/abs/2302.06754

Pavlos Kefalas, Zafar Ali & Yannis Manolopoulos, “A Critical Review on Citation Recommendation Systems Through a Graph-Centric Approach,” SN Comput. Sci., vol. 7, 2026, [Online]. Available: https://link.springer.com/article/10.1007/s42979-026-04848-2

Malte Ostendorff, Till Blume, Terry Ruas, Bela Gipp, Georg Rehm, “Specialized Document Embeddings for Aspect-based Similarity of Research Papers,” arXiv:2203.14541, 2022, [Online]. Available: https://arxiv.org/abs/2203.14541

Muhammad Umair, Tangina Sultana, “Pre-trained language models for keyphrase prediction: A review,” ICT Express, vol. 10, no. 4, pp. 871–890, 2024, doi: https://doi.org/10.1016/j.icte.2024.05.015.

“Automatic keyphrases extraction: an overview of deep learning approaches | Ajallouda | Bulletin of Electrical Engineering and Informatics.” Accessed: Apr. 30, 2026. [Online]. Available: https://beei.org/index.php/EEI/article/view/4130

Qi Liu, Wenjun Ke, Xiaoguang Yuan, “AdaptiveUKE: Towards adaptive unsupervised keyphrase extraction with gated topic modeling,” Expert Syst. Appl., vol. 250, p. 123926, 2024, doi: https://doi.org/10.1016/j.eswa.2024.123926.

Lianhuan Li, Zheng Zhang, Shaoda Zhang, “Hybrid Algorithm Based on Content and Collaborative Filtering in Recommendation System Optimization and Simulation,” Sci. Program., 2021, doi: https://doi.org/10.1155/2021/7427409.

M. Karanam, L. Krishnanand, V. K. Manupati, and S. S. Nudurupati, “Emerging themes and future research directions in the cold supply chain: a bibliometric and co-citation analysis,” Benchmarking, vol. 32, no. 5, pp. 1742–1775, May 2025, doi: 10.1108/BIJ-11-2023-0771.

R. G. Castanha, M. C. C. Grácio, and A. Perianes-Rodríguez, “Co-citation analysis between coupler authors of a scientific domain’s citation identity: a case study in scientometrics,” Sci. 2024 1293, vol. 129, no. 3, pp. 1545–1566, Jan. 2024, doi: 10.1007/S11192-023-04927-8.

T. Zhang, J. Fang, Z. Yang, B. Cao, and J. Fan, “TATKC: A Temporal Graph Neural Network for Fast Approximate Temporal Katz Centrality Ranking,” WWW 2024 - Proc. ACM Web Conf., pp. 527–538, May 2024, doi: 10.1145/3589334.3645432;PAGE:STRING:ARTICLE/CHAPTER.

Francesca Arrigo, Daniele Bertaccini, Alessandro Filippo, “Updating Katz centrality by counting walks,” arXiv:2411.19560, 2025, [Online]. Available: https://arxiv.org/abs/2411.19560

Vaios Stergiopoulos, Michael Vassilakopoulos, Eleni Tousidou & Antonio Corral, “An academic recommender system on large citation data based on clustering, graph modeling and deep learning,” Knowl. Inf. Syst., vol. 66, pp. 4463–4496, 2024, [Online]. Available: https://link.springer.com/article/10.1007/s10115-024-02094-7

Junhao Shen, Mohammad Ausaf Ali Haqqani, Beichen Hu, Cheng Huang, Xihao Xie, Tsengdar Lee, Jia Zhang, “Temporal Graph Neural Network-Powered Paper Recommendation on Dynamic Citation Networks,” arXiv:2408.15371, 2024, [Online]. Available: https://arxiv.org/abs/2408.15371

Fezzeh Ebrahimi, Asefeh Asemi, Amin Nezarat & Andrea Ko, “Developing a mathematical model of the co-author recommender system using graph mining techniques and big data applications,” J. Big Data, vol. 8, no. 44, 2021, [Online]. Available: https://link.springer.com/article/10.1186/s40537-021-00432-y

C. Zhang et al., “MKCRec: Meta-relation guided Knowledge Coupling for Paper Recommendation,” ACM Trans. Inf. Syst., vol. 43, no. 3, May 2025, doi: 10.1145/3715101;WGROUP:STRING:ACM.

L. Katz, “A new status index derived from sociometric analysis,” Psychometrika, vol. 18, no. 1, pp. 39–43, Mar. 1953, doi: 10.1007/BF02289026/METRICS.

Ibrar Ahmed, “BCSw: Weighted Section-Wise Bibliographic Coupling to Find Related Research Papers,” 2025, [Online]. Available: https://cust.edu.pk/wp-content/uploads/2025/03/Ibrar_Ahmed_CS.pdf

Natália Figueiredo, Lurdes Patrício, “Unveiling university-industry knowledge transfer: insights from bibliographic coupling analysis,” VINE J. Inf. Knowl. Manag. Syst., vol. 55, no. 6, pp. 1604–1628, 2025, doi: https://doi.org/10.1108/VJIKMS-07-2024-0270.

“Semanticscholar for AI education: review, features & use cases.” Accessed: Apr. 30, 2026. [Online]. Available: https://semanticscholar.en.softonic.com/web-apps

Downloads

Published

2026-05-11

How to Cite

Muhammad Ammad Idrees, M. Abdul Qadir, & Maryam Nageen. (2026). Experimental Evaluation of Related Papers Finding Techniques. International Journal of Innovations in Science & Technology, 8(3), 403–417. Retrieved from https://journal.50sea.com/index.php/IJIST/article/view/1790