Author(s)
Young Lee
Ajibola B. Bakare
Jhuree Hong
Claus-Peter Richter
Jonathan Kuriakose
Affiliation(s)
Central Michigan University College of Medicine; Tulane School of Medicine; Freelance; Northwestern Medicine Department of Otolaryngology ;
Abstract:
Educational Objective: At the conclusion of this presentation, the participants should be able to understand the potential use of LLMs in systematic reviews of topics within OHNS.
Objectives: Large language models (LLMs), such as ChatGPT and Bard, are generative deep learning algorithms that can process large datasets. There are several potential uses for LLMs within medical research; however, few have examined their potential in carrying out systematic reviews in otolaryngology-head and neck surgery (OHNS). This study aims to compare the efficacy of ChatGPTv3.5 and Bard in conducting systematic literature reviews within OHNS.
Study Design: Literature review comparative analysis.
Methods: The methods of three systematic reviews, which used PRISMA guidelines, were replicated using ChatGPTv3.5 and Bard. The outputs generated were compiled by author, paper title, publication year, and journal and compared to reference articles cited in the systematic review. Each output was cross-referenced with medical databases, to determine authenticity of the outputs' journals.
Results: Several themes emerged comparing Bard and ChatGPT across the three reference systematic reviews. In replicating Wong et al.’s review, Bard generated more outputs than ChatGPT. Furthermore, Bard demonstrated a broader date range than ChatGPT in replicating Jabbour et al.’s review. Finally, in Wu et al.’s review, ChatGPT#2 identified more genuine outputs than Bard#2.
Conclusions: LLMs did not accurately replicate the methodology of a peer reviewed manuscript and should be utilized with caution. The outputs contained several inaccuracies, ranging from fictitious citations to citations with partial truths. Neither Bard nor ChatGPT provided good accuracy or identification of authentic papers suitable for systematic reviews. PRISMA and other literature review guidelines remain the gold standard.