The role of large language models in emergency care: a comprehensive benchmarking study
Borna Naderi·Alexander Fortenko·Robert Tanouye·Anita Ghandehari·Longsha Liu·R. Andrew Taylor·Darius Khoshons·Christian Davidson·Neil Bhavsar·Nancy Creech·Justin Norden·R. Sharma·Shriman Balasubramanian
With EDs increasingly overburdened, Large Language Models (LLMs) may help streamline workflow and decision-making. We evaluated their emergency medicine knowledge and performance in simulated ED tasks. This two-part study first tested factual knowledge of 18 LLMs using a curated MedMCQA subset covering 12 ED chief complaints, assessing accuracy, precision, and recall. Five models (GPT-5, GPT-4, Claude 3.5, Claude 4, and LLaMA 3.1) were then evaluated on patient summaries, Emergency Severity Inde
