Touro University Research Day 2025

Poster SBESH31

Status: File under review
Section: TUMT
Track: SBESH
Presenter: Lovingly Ocampo

Artificial Intelligence vs. Orthopedic-In-Training-Examination: A Cause for Concern or Excitement?

Author(s)
Ashraf Nawari ¹
Jamal Zahir ²
Sonal Kumar ²
Lovingly Ocampo ³
Olivia Opara ²
Hassan Ahmad ²
Benjamin Crawford ⁴
Brian Feeley, MD (faculty) ¹

Affiliation(s)
¹Department of Orthopaedic Surgery, University of California, San Francisco, CA ; ²Department of Surgery, Ross University School of Medicine,; ³College of Osteopathic Medicine, Touro Middletown, NY ; ⁴Department of Orthopaedic Surgery, St Mary's Medical Center San Francisco Orthopaedic Residency; Program ;

Abstract:
Background:
The rapid improvement of generative artificial intelligence (AI) models in medical domains including answering board-style questions warrants further investigation regarding their utility and accuracy in answering orthopaedic surgery written board questions. Previous studies have analyzed the performance of ChatGPT alone on board exams, but a head-to-head analysis of multiple current AI models has yet to be performed. Hence, the objective of this study was to compare the utility and accuracy of various large language models (LLMs) in answering Orthopaedic Surgery In-Training Exam (OITE) written board questions to each other as well as orthopaedic surgery residents.
Methods:
A complete set of questions from the OITE 2022 exam was inputted into various LLMs and results were calculated and compared against orthopaedic surgery residents nationally. Results were analyzed by overall performance and question type. Type A questions related to knowledge and recall of facts, Type B questions involved diagnosis and analysis of information, and Type C questions focused on the evaluation and management of diseases, requiring knowledge and reasoning to develop treatment plans.
Results:
Google Gemini was the most accurate tool answering 69.9% of questions correctly. Google Gemini also performed superiorly to ChatGPT and Claude on Type A (76.9%) and Type C questions (67.4%), with Claude performing superiorly on Type B questions (70.7%). Questions without images were answered with greater accuracy compared to those with images (65.9% vs. 34.1%). All LLMs performed above the average of a first-year orthopaedic surgery intern, with Google Gemini and Claude performance approaching that of fourth- and fifth-year orthopaedic surgery residents.

SBESH31 - Artificial Intelligence vs. Orthopedic-In-Training-Examination: A Cause for Concern or Excitement?
SBESH32 - Evolving Reciprocal Determinism: Increasing Focus on Personal Factors in Teachers’ Meaning-Making
SBESH33 - Exploring Perceptions of Artificial Intelligence (AI) Among US Medical Students
SBESH34 - Intimate Partner Violence: Mixed Findings on Factors Which Motivate Leaving
SBESH35 - Exploring the Impact of Intervention Types and Prior Attitudes on Attitudes Toward Individuals with Intellectual and Developmental Disabilities in the Workplace
SBESH36 - Effects of Self-Talk and Head Movements on Self-Esteem and Self-Talk as a Modulator for Self-Esteem and as a Coping Mechanism for Negative Experiences
SBESH37 - Embodying Professional Identity Formation: Insights from the Perspective of Student Pharmacists
SBESH38 - The ECF Framework in Action: Case Studies of War's Impact on Education
SBESH39 - Impact of Pharmacist-delivered Education on New York Dental Provider’s Attitudes Towards Immunizations Practices
SBESH40 - An Alternative Treatment for Alcohol Use Disorder
SBESH41 - Leadership Reflections of Pharmacy Leaders: Wisdom from the Trenches
SBESH42 - The Impact of Praise Feedback on Academic Performance and Motivation
SBESH43 - Can Anyone Be a Scientist? The Evolving Perceptions of Undergraduate Students

POSTERS 99-111 OF 111

True