Author(s)
Nick Melott, MS
Aurelia Monk, BA
Shreyas Pyati, BABS
Cameron Worden, MD
Ezer H Benaim, MD
Scott Hardison, MD, FAAOA
Matt Lelegren, MD
Brian D Thorp, MD, FACS, FARS
Charles S Ebert Jr., MD, MPH, FARS, FAAOA, FACS
Cristine Klatt-Cromwell, MD
Adam J Kimple MD, PhD, FACS, FARS
Brent A. Senior, MD, FACS, FARS
Affiliation(s)
Department of Otolaryngology—Head and Neck Surgery, University of North Carolina, Chapel Hill, NC, USA;
Abstract:
Objective: ChatGPT is an artificial intelligence language-learning model that has demonstrated the ability to perform well on medical licensing and specialty board exams. This project aims to examine ChatGPT’s accuracy on questions developed for trainees preparing to take the American Board of Otolaryngology – Head and Neck Surgery (ABOto) Written Qualifying Examination (WQE) from a popular test-prep company.
Methods: Two hundred ninety-six otolaryngology questions from BoardVitals were offered to ChatGPT in three prompt formats: open-ended (OE), multiple-choice without justification (MCNJ), and multiple-choice with justification (MCJ). The prompts were each entered into new sessions of ChatGPT, and the responses recorded as correct, incorrect, or indeterminate. Questions were classified as “less difficult” if they had = 70% correct and categorized as “more difficult” if they were below that threshold. A cutoff score for passing was set at 60%, based on established question difficulty levels for the WQE.
Results: ChatGPT correctly answered open-ended questions at a rate of 55.1%, MCNJ at 58.4%, and MCJ at 53.7%. Comparative performance based on difficulty was as follows (less/more difficult): open-ended (66.1%/31.2%), MCNJ (69.4%/37.9%), MCJ (65.3%/32%).
Conclusion: ChatGPT demonstrated accuracy levels on ABOto questions consistent with failing scores on all question prompt types. It performed more accurately on questions with a national average percent correct of greater than or equal to 70%.