Triological Society 2024 Combined Sections Meeting

Poster A057

Status: File under review
Section: Southern
Track: General
Presenter: Nick Melott MS

Good News: ChatGPT Can’t Pass the Otolaryngology Boards… Yet!

Author(s)
Nick Melott, MS
Aurelia Monk, BA
Shreyas Pyati, BABS
Cameron Worden, MD
Ezer H Benaim, MD
Scott Hardison, MD, FAAOA
Matt Lelegren, MD
Brian D Thorp, MD, FACS, FARS
Charles S Ebert Jr., MD, MPH, FARS, FAAOA, FACS
Cristine Klatt-Cromwell, MD
Adam J Kimple MD, PhD, FACS, FARS
Brent A. Senior, MD, FACS, FARS

Affiliation(s)
Department of Otolaryngology—Head and Neck Surgery, University of North Carolina, Chapel Hill, NC, USA;

Abstract:
Objective: ChatGPT is an artificial intelligence language-learning model that has demonstrated the ability to perform well on medical licensing and specialty board exams. This project aims to examine ChatGPT’s accuracy on questions developed for trainees preparing to take the American Board of Otolaryngology – Head and Neck Surgery (ABOto) Written Qualifying Examination (WQE) from a popular test-prep company.
Methods: Two hundred ninety-six otolaryngology questions from BoardVitals were offered to ChatGPT in three prompt formats: open-ended (OE), multiple-choice without justification (MCNJ), and multiple-choice with justification (MCJ). The prompts were each entered into new sessions of ChatGPT, and the responses recorded as correct, incorrect, or indeterminate. Questions were classified as “less difficult” if they had = 70% correct and categorized as “more difficult” if they were below that threshold. A cutoff score for passing was set at 60%, based on established question difficulty levels for the WQE.
Results: ChatGPT correctly answered open-ended questions at a rate of 55.1%, MCNJ at 58.4%, and MCJ at 53.7%. Comparative performance based on difficulty was as follows (less/more difficult): open-ended (66.1%/31.2%), MCNJ (69.4%/37.9%), MCJ (65.3%/32%).
Conclusion: ChatGPT demonstrated accuracy levels on ABOto questions consistent with failing scores on all question prompt types. It performed more accurately on questions with a national average percent correct of greater than or equal to 70%.

A057 - Good News: ChatGPT Can’t Pass the Otolaryngology Boards… Yet!
A058 - Sociodemographic Trends in Hypoglossal Nerve Stimulation: A Scoping Review
A059 - Using Artificial Intelligence to Improve Readability and Comprehension Levels of Otolaryngology Patient Education Materials
A060 - Classification of Eustachian Tube Dysfunction Phenotypes
A061 - Dexmedetomidine and Propofol Anesthetic Regimen Reduces Drug Induced Sleep Endoscopy Operative Time Compared to Dexmedetomidine Alone
A062 - Improving the Patient Experience: What Makes a Difference?
A063 - Google Is of Poor Quality for Postoperative Tracheostomy Care Information
A064 - YouTube Is of Moderate Quality for Postoperative Tracheostomy Care Information
A065 - Does Learning the Inner Ear Make You Feel Dizzy?
A066 - Performance of ChatGPT in Novel Otolaryngology Systematic Review Ideation
A067 - Racial Disparities in Parotidectomy Surgery: A NSQIP Analysis
A068 - Evaluating Prevalence and Barriers to Health Disparities Education among Otolaryngology Residency Training
A069 - Post-Tonsillectomy Hemorrhage and SSRI Use
A070 - WITHDRAWN

POSTERS 57-70 OF 159

True