The role of large language models in emergency care: a comprehensive benchmarking study

With EDs increasingly overburdened, Large Language Models (LLMs) may help streamline workflow and decision-making. We evaluated their emergency medicine knowledge and performance in simulated ED tasks. This two-part study first tested factual knowledge of 18 LLMs using a curated MedMCQA subset covering 12 ED chief complaints, assessing accuracy, precision, and recall. Five models (GPT-5, GPT-4, Claude 3.5, Claude 4, and LLaMA 3.1) were then evaluated on patient summaries, Emergency Severity Inde