Performance of large language models as an information resource on functional hypothalamic amenorrhea for patients and healthcare professionals

IntroductionTo assess and compare the accuracy, readability, and overall performance of large language models (LLMs) in answering questions about functional hypothalamic amenorrhea (FHA) for patients and healthcare professionals.MethodsA total of 11 patient-level and 15 clinician-level FHA-related questions were entered separately into four LLMs: ChatGPT 3.5 (free version), ChatGPT 4.0 (updated, paid subscription), Gemini, and OpenEvidence. OpenEvidence was used only for clinician-based question