Quantifying Similarities: Oncology Documents from Google Bard and ChatGPT

Muhammad Naveed

Authors

Muhammad Naveed Department of Computer Science & Information Technology, University of Baluchistan, Quetta, Pakistan

Keywords:

Large Language Models, ChatGPT, Google Bard, Cosine Similarity, Jaccard Similarity

Abstract

Large language models hold immense promise for the future of text generation. Google Bard and ChatGPT, two prominent large language models originating from different research laboratories, have been subjects of various studies since their introduction. Despite numerous perspectives explored in the studies, none has specifically delved into the analysis of the similarity between texts generated by these models within the same category. This study addresses this gap by comparing the document generation capabilities of Google Bard and ChatGPT. The analysis focuses on topic-wise comparable documents related to oncology. In this study, 50 oncology-related documents generated by Google Bard are juxtaposed with equivalent topic-wise documents produced by ChatGPT, utilizing both cosine similarity and Jaccard similarity for comparison. The analysis employed statistical tests including the Kolmogorov-Smirnov test, Shapiro-Wilk test, and the one-sample Wilcoxon signed-rank test. The findings revealed a significant level of resemblance among the documents generated by both models: cosine similarity (mean = 0.66, std. dev. = 0.11, min = 0.23, max = 0.80) and Jaccard similarity (mean = 0.88, std. dev. = 0.06, min = 0.7, max = 1.0). This suggests a probable commonality in their training datasets or sources of oncology-related information. The study also posited that the observed similarity could be attributed to the probabilistic nature of language models and the potential for overfitting during their training processes. This study stands out for offering a unique direction and outcomes that pave the way for further exploration in the domain of large language models.

References

P. Hamet, and J. Trembla, “Artificial Intelligence in Medicine | Journal | ScienceDirect.com by Elsevier.” Accessed: Dec. 16, 2023. [Online]. Available: https://www.sciencedirect.com/journal/artificial-intelligence-in-medicine

R. E. Lopez-Martinez and G. Sierra, “Research trends in the international literature on natural language processing, 2000-2019 - A bibliometric study,” J. Scientometr. Res., vol. 9, no. 3, pp. 310–318, Sep. 2020, doi: 10.5530/JSCIRES.9.3.38.

Y. Shen et al., “ChatGPT and Other Large Language Models Are Double-edged Swords,” Radiology, vol. 307, no. 2, p. 2023, Apr. 2023, doi: 10.1148/RADIOL.230163/ASSET/IMAGES/LARGE/RADIOL.230163.FIG1.JPEG.

M. C. Rillig, M. Ågerstrand, M. Bi, K. A. Gould, and U. Sauerland, “Risks and Benefits of Large Language Models for the Environment,” Environ. Sci. Technol., vol. 57, no. 9, pp. 3464–3466, Mar. 2023, doi: 10.1021/ACS.EST.3C01106/ASSET/IMAGES/LARGE/ES3C01106_0004.JPEG.

A. Bryant et al., “Qualitative Research Methods for Large Language Models: Conducting Semi-Structured Interviews with ChatGPT and BARD on Computer Science Education,” Informatics 2023, Vol. 10, Page 78, vol. 10, no. 4, p. 78, Oct. 2023, doi: 10.3390/INFORMATICS10040078.

A. Egli and A. Egli, “ChatGPT, GPT-4, and Other Large Language Models: The Next Revolution for Clinical Microbiology?,” Clin. Infect. Dis., vol. 77, no. 9, pp. 1322–1328, Nov. 2023, doi: 10.1093/CID/CIAD407.

C. Tan Yip Ming et al., “The Potential Role of Large Language Models in Uveitis Care: Perspectives After ChatGPT and Bard Launch,” Ocul. Immunol. Inflamm., Aug. 2023, doi: 10.1080/09273948.2023.2242462.

S. Thapa and S. Adhikari, “ChatGPT, Bard, and Large Language Models for Biomedical Research: Opportunities and Pitfalls,” Ann. Biomed. Eng., vol. 51, no. 12, pp. 2647–2651, Dec. 2023, doi: 10.1007/S10439-023-03284-0/METRICS.

T. B. Brown et al., “Language Models are Few-Shot Learners,” Adv. Neural Inf. Process. Syst., vol. 2020-December, May 2020, Accessed: Dec. 16, 2023. [Online]. Available: https://arxiv.org/abs/2005.14165v4

Ö. AYDIN, “Google Bard Generated Literature Review: Metaverse,” J. AI, vol. 7, no. 1, pp. 1–14, Dec. 2023, doi: 10.61969/JAI.1311271.

R. C. T. Cheong et al., “Artificial intelligence chatbots as sources of patient education material for obstructive sleep apnoea: ChatGPT versus Google Bard,” Eur. Arch. Oto-Rhino-Laryngology, pp. 1–9, Nov. 2023, doi: 10.1007/S00405-023-08319-9/METRICS.

N. Fijačko, G. Prosen, B. S. Abella, S. Metličar, and G. Štiglic, “Can novel multimodal chatbots such as Bing Chat Enterprise, ChatGPT-4 Pro, and Google Bard correctly interpret electrocardiogram images?,” Resuscitation, vol. 193, p. 110009, Dec. 2023, doi: 10.1016/j.resuscitation.2023.110009.

N. S. Patil, R. S. Huang, C. B. van der Pol, and N. Larocque, “Comparative Performance of ChatGPT and Bard in a Text-Based Radiology Knowledge Assessment,” Can. Assoc. Radiol. J., Aug. 2023, doi: 10.1177/08465371231193716/ASSET/IMAGES/LARGE/10.1177_08465371231193716-FIG1.JPEG.

F. Y. Al-Ashwal, M. Zawiah, L. Gharaibeh, R. Abu-Farha, and A. N. Bitar, “Evaluating the Sensitivity, Specificity, and Accuracy of ChatGPT-3.5, ChatGPT-4, Bing AI, and Bard Against Conventional Drug-Drug Interactions Clinical Tools,” Drug. Healthc. Patient Saf., vol. 15, pp. 137–147, Sep. 2023, doi: 10.2147/DHPS.S425858.

R. K. Gan, J. C. Ogbodo, Y. Z. Wee, A. Z. Gan, and P. A. González, “Performance of Google bard and ChatGPT in mass casualty incidents triage,” Am. J. Emerg. Med., vol. 75, pp. 72–78, Jan. 2024, doi: 10.1016/J.AJEM.2023.10.034.

R. Ali et al., “Performance of ChatGPT, GPT-4, and Google Bard on a Neurosurgery Oral Boards Preparation Question Bank,” Neurosurgery, vol. 93, no. 5, pp. 1090–1098, Nov. 2023, doi: 10.1227/NEU.0000000000002551.

A. K. D. Dhanvijay et al., “Performance of Large Language Models (ChatGPT, Bing Search, and Google Bard) in Solving Case Vignettes in Physiology,” Cureus, vol. 15, no. 8, Aug. 2023, doi: 10.7759/CUREUS.42972.

D. E. O and C. E. Daniel O, “An analysis of Watson vs. BARD vs. ChatGPT: The Jeopardy! Challenge,” AI Mag., vol. 44, no. 3, pp. 282–295, Sep. 2023, doi: 10.1002/AAAI.12118.

V. Plevris, G. Papazafeiropoulos, and A. Jiménez Rios, “Chatbots Put to the Test in Math and Logic Problems: A Comparison and Assessment of ChatGPT-3.5, ChatGPT-4, and Google Bard,” AI, vol. 4, no. 4, pp. 949–969, Oct. 2023, doi: 10.3390/ai4040048.

S. Koga, N. B. Martin, and D. W. Dickson, “Evaluating the performance of large language models: ChatGPT and Google Bard in generating differential diagnoses in clinicopathological conferences of neurodegenerative disorders,” Brain Pathol., p. e13207, 2023, doi: 10.1111/BPA.13207.

I. Seth et al., “Comparing the Efficacy of Large Language Models ChatGPT, BARD, and Bing AI in Providing Information on Rhinoplasty: An Observational Study,” Aesthetic Surg. J. Open Forum, vol. 5, Jan. 2023, doi: 10.1093/ASJOF/OJAD084.

“Natural Language Processing with Python and spaCy: A Practical Introduction - Yuli Vasiliev - Google Books.” Accessed: Dec. 16, 2023. [Online]. Available: https://books.google.com.pk/books?id=w_ZqywEACAAJ&printsec=copyright&redir_esc=y#v=onepage&q&f=false

R. Spring and M. Johnson, “The possibility of improving automated calculation of measures of lexical richness for EFL writing: A comparison of the LCA, NLTK and SpaCy tools,” System, vol. 106, p. 102770, Jun. 2022, doi: 10.1016/J.SYSTEM.2022.102770.

R. Verma and A. Mittal, “Multiple attribute group decision-making based on novel probabilistic ordered weighted cosine similarity operators with Pythagorean fuzzy information,” Granul. Comput., vol. 8, no. 1, pp. 111–129, Jan. 2023, doi: 10.1007/S41066-022-00318-1/METRICS.

R. Zhang, Z. Xu, and X. Gou, “ELECTRE II method based on the cosine similarity to evaluate the performance of financial logistics enterprises under double hierarchy hesitant fuzzy linguistic environment,” Fuzzy Optim. Decis. Mak., vol. 22, no. 1, pp. 23–49, Mar. 2023, doi: 10.1007/S10700-022-09382-3/METRICS.

D. Dede Şener, H. Ogul, and S. Basak, “Text-based experiment retrieval in genomic databases,” J. Inf. Sci., Sep. 2022, doi: 10.1177/01655515221118670/ASSET/IMAGES/LARGE/10.1177_01655515221118670-FIG4.JPEG.

R. M. Suleman and I. Korkontzelos, “Extending latent semantic analysis to manage its syntactic blindness,” Expert Syst. Appl., vol. 165, p. 114130, Mar. 2021, doi: 10.1016/J.ESWA.2020.114130.

Y. Chen, S. Nan, Q. Tian, H. Cai, H. Duan, and X. Lu, “Automatic RadLex coding of Chinese structured radiology reports based on text similarity ensemble,” BMC Med. Inform. Decis. Mak., vol. 21, no. 9, pp. 1–11, Nov. 2021, doi: 10.1186/S12911-021-01604-9/TABLES/3.

T. Bin Sarwar, N. M. Noor, and M. S. U. Miah, “Evaluating keyphrase extraction algorithms for finding similar news articles using lexical similarity calculation and semantic relatedness measurement by word embedding,” PeerJ Comput. Sci., vol. 8, p. e1024, Jul. 2022, doi: 10.7717/PEERJ-CS.1024/SUPP-1.

D. Vogler, L. Udris, and M. Eisenegger, “Measuring Media Content Concentration at a Large Scale Using Automated Text Comparisons,” Journal. Stud., vol. 21, no. 11, pp. 1459–1478, Aug. 2020, doi: 10.1080/1461670X.2020.1761865.