Experimental Evaluation of Related Papers Finding Techniques
Keywords:
Paper Recommendation, Bibliometric Technique, Semantic Similarity, Jensen-Shannon Divergence, Citation-Based TechniqueAbstract
Introduction/Importance of Study: Related-paper recommendation systems generally contain two categories: Content-Based (CB) approaches, which estimate relatedness using semantic similarity between research paper texts, and metadata-based approaches, which infer relatedness from bibliographic information such as citations, references, authorship, and publication venue. Although CB methods, such as Jensen–Shannon Divergence (JSD), computed over TF–IDF representations provide accurate relatedness scores, they are computationally expensive because they require processing of the full text of each paper. Metadata-based methods offer a more efficient substitute, but their relative effectiveness to strong CB measures remains unclear. This study investigates which bibliometric technique correlates most strongly with JSD-based semantic relatedness to identify a low-cost substitute for computationally expensive CB methods.
Novelty Statement: Since no existing dataset contained the required combination of full text, citations, references, and “related papers” lists, we constructed a new dataset of 1,225 papers, selected to statistically represent the population for a target keyword at 95% confidence with ±2.8% margin of error. No prior research has analyzed bibliometric methods using a unified dataset.
Material and Method: JSD-based relatedness scores were computed using full-text TF–IDF representations for all papers. We then calculated bibliographic relatedness using bibliographic coupling (BC), co-citation coupling (CC), and Katz similarity, and also extracted relatedness scores from Semantic Scholar (SS).
Result and Discussion: Correlation analysis revealed the following Pearson correlations with JSD: BC = 0.40, SS = 0.35, Katz = 0.01, CC = –0.11. These results indicate that BC-based relatedness aligns most closely with CB semantic similarity, followed by SS, while Katz and CC show negligible or negative correlation. Notably, the finding that Semantic Scholar’s related-paper measure correlates less strongly with JSD than bibliographic coupling is both surprising and practically important.
Concluding Remarks: Overall, the results highlight the potential of BC-based methods as an efficient and reliable alternative to traditional full-text similarity computations for estimating relatedness.
References
Yicong Liang, Lap Kei Lee, “A Systematic Review of Citation Recommendation Over the Past Two Decades,” Int. J. Semant. Web Inf. Syst., vol. 19, no. 1, 2023, doi: https://doi.org/10.4018/IJSWIS.324071.
Iratxe Pinedo, Mikel Larrañaga, Ana Arruarte, “Recent Advances and Trends in Research Paper Recommender Systems: A Comprehensive Survey,” arXiv:2508.08828, 2026, [Online]. Available: https://arxiv.org/abs/2508.08828
Anton Klarin, “How to conduct a bibliometric content analysis: Guidelines and contributions of content co-occurrence or co-word literature reviews,” Int. J. Consum. Stud., 2024, [Online]. Available: https://onlinelibrary.wiley.com/doi/10.1111/ijcs.13031
M. Coşkun, A. Baggag, and M. Koyutürk, “Fast computation of Katz index for efficient processing of link prediction queries,” Data Min. Knowl. Discov. 2021 354, vol. 35, no. 4, pp. 1342–1368, Apr. 2021, doi: 10.1007/S10618-021-00754-8.
Luc Phan Tan, “Bibliometrics of social entrepreneurship research: Cocitation and bibliographic coupling analyses,” Cogent Bus. Manag., vol. 9, no. 1, 2022, doi: https://doi.org/10.1080/23311975.2022.2124594.
Guoxiu He, Aixin Sun, Wei Lu, “Research Explosion: More Effort to Climb onto Shoulders of the Giant,” arXiv:2307.06506, 2023, [Online]. Available: https://arxiv.org/abs/2307.06506
Herimanto Herimanto, Kevin Samosir, “A Comparative Analysis of Content-Based Filtering and TF-IDF Approaches for Enhancing Sports Recommendation Systems,” Innov. Res. Informatics, vol. 6, no. 2, pp. 90–97, 2024, doi: 10.37058/innovatics.v6i2.12404.
Shadikur Rahman, Hasibul Karim Shanto, Umme Ayman Koana, Syed Muhammad Danish, “Automated Research Article Classification and Recommendation Using NLP and ML,” arXiv:2510.05495, 2025, [Online]. Available: https://arxiv.org/abs/2510.05495
Rodney Kinney, Chloe Anastasiades, Russell Authur, “The Semantic Scholar Open Data Platform,” arXiv:2301.10140, 2025, [Online]. Available: https://arxiv.org/abs/2301.10140
C. Jeong, S. Jang, E. Park, and S. Choi, “A context-aware citation recommendation model with BERT and graph convolutional networks,” Sci. 2020 1243, vol. 124, no. 3, pp. 1907–1922, Jul. 2020, doi: 10.1007/S11192-020-03561-Y.
“MIReAD: Simple Method for Learning High-quality Representations from Scientific Documents - ACL Anthology.” Accessed: Apr. 30, 2026. [Online]. Available: https://aclanthology.org/2023.acl-short.46/
G. Mustafa, M. Usman, M. T. Afzal, A. Shahid and A. Koubaa, “A Comprehensive Evaluation of Metadata-Based Features to Classify Research Paper’s Topics,” IEEE Access, vol. 9, 2021, doi: 10.1109/ACCESS.2021.3115148.
Srishti Palani, Aakanksha Naik, Doug Downey, Amy X. Zhang, Jonathan Bragg, Joseph Chee Chang, “Relatedly: Scaffolding Literature Reviews with Existing Related Work Sections,” arXiv:2302.06754, 2023, [Online]. Available: https://arxiv.org/abs/2302.06754
Pavlos Kefalas, Zafar Ali & Yannis Manolopoulos, “A Critical Review on Citation Recommendation Systems Through a Graph-Centric Approach,” SN Comput. Sci., vol. 7, 2026, [Online]. Available: https://link.springer.com/article/10.1007/s42979-026-04848-2
Malte Ostendorff, Till Blume, Terry Ruas, Bela Gipp, Georg Rehm, “Specialized Document Embeddings for Aspect-based Similarity of Research Papers,” arXiv:2203.14541, 2022, [Online]. Available: https://arxiv.org/abs/2203.14541
Muhammad Umair, Tangina Sultana, “Pre-trained language models for keyphrase prediction: A review,” ICT Express, vol. 10, no. 4, pp. 871–890, 2024, doi: https://doi.org/10.1016/j.icte.2024.05.015.
“Automatic keyphrases extraction: an overview of deep learning approaches | Ajallouda | Bulletin of Electrical Engineering and Informatics.” Accessed: Apr. 30, 2026. [Online]. Available: https://beei.org/index.php/EEI/article/view/4130
Qi Liu, Wenjun Ke, Xiaoguang Yuan, “AdaptiveUKE: Towards adaptive unsupervised keyphrase extraction with gated topic modeling,” Expert Syst. Appl., vol. 250, p. 123926, 2024, doi: https://doi.org/10.1016/j.eswa.2024.123926.
Lianhuan Li, Zheng Zhang, Shaoda Zhang, “Hybrid Algorithm Based on Content and Collaborative Filtering in Recommendation System Optimization and Simulation,” Sci. Program., 2021, doi: https://doi.org/10.1155/2021/7427409.
M. Karanam, L. Krishnanand, V. K. Manupati, and S. S. Nudurupati, “Emerging themes and future research directions in the cold supply chain: a bibliometric and co-citation analysis,” Benchmarking, vol. 32, no. 5, pp. 1742–1775, May 2025, doi: 10.1108/BIJ-11-2023-0771.
R. G. Castanha, M. C. C. Grácio, and A. Perianes-Rodríguez, “Co-citation analysis between coupler authors of a scientific domain’s citation identity: a case study in scientometrics,” Sci. 2024 1293, vol. 129, no. 3, pp. 1545–1566, Jan. 2024, doi: 10.1007/S11192-023-04927-8.
T. Zhang, J. Fang, Z. Yang, B. Cao, and J. Fan, “TATKC: A Temporal Graph Neural Network for Fast Approximate Temporal Katz Centrality Ranking,” WWW 2024 - Proc. ACM Web Conf., pp. 527–538, May 2024, doi: 10.1145/3589334.3645432;PAGE:STRING:ARTICLE/CHAPTER.
Francesca Arrigo, Daniele Bertaccini, Alessandro Filippo, “Updating Katz centrality by counting walks,” arXiv:2411.19560, 2025, [Online]. Available: https://arxiv.org/abs/2411.19560
Vaios Stergiopoulos, Michael Vassilakopoulos, Eleni Tousidou & Antonio Corral, “An academic recommender system on large citation data based on clustering, graph modeling and deep learning,” Knowl. Inf. Syst., vol. 66, pp. 4463–4496, 2024, [Online]. Available: https://link.springer.com/article/10.1007/s10115-024-02094-7
Junhao Shen, Mohammad Ausaf Ali Haqqani, Beichen Hu, Cheng Huang, Xihao Xie, Tsengdar Lee, Jia Zhang, “Temporal Graph Neural Network-Powered Paper Recommendation on Dynamic Citation Networks,” arXiv:2408.15371, 2024, [Online]. Available: https://arxiv.org/abs/2408.15371
Fezzeh Ebrahimi, Asefeh Asemi, Amin Nezarat & Andrea Ko, “Developing a mathematical model of the co-author recommender system using graph mining techniques and big data applications,” J. Big Data, vol. 8, no. 44, 2021, [Online]. Available: https://link.springer.com/article/10.1186/s40537-021-00432-y
C. Zhang et al., “MKCRec: Meta-relation guided Knowledge Coupling for Paper Recommendation,” ACM Trans. Inf. Syst., vol. 43, no. 3, May 2025, doi: 10.1145/3715101;WGROUP:STRING:ACM.
L. Katz, “A new status index derived from sociometric analysis,” Psychometrika, vol. 18, no. 1, pp. 39–43, Mar. 1953, doi: 10.1007/BF02289026/METRICS.
Ibrar Ahmed, “BCSw: Weighted Section-Wise Bibliographic Coupling to Find Related Research Papers,” 2025, [Online]. Available: https://cust.edu.pk/wp-content/uploads/2025/03/Ibrar_Ahmed_CS.pdf
Natália Figueiredo, Lurdes Patrício, “Unveiling university-industry knowledge transfer: insights from bibliographic coupling analysis,” VINE J. Inf. Knowl. Manag. Syst., vol. 55, no. 6, pp. 1604–1628, 2025, doi: https://doi.org/10.1108/VJIKMS-07-2024-0270.
“Semanticscholar for AI education: review, features & use cases.” Accessed: Apr. 30, 2026. [Online]. Available: https://semanticscholar.en.softonic.com/web-apps
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 50sea

This work is licensed under a Creative Commons Attribution 4.0 International License.


















