D1.34 - Guideline adherence and safety of AI chatbots in allergology
Background
Artificial intelligence (AI)-driven chatbots are increasingly used by patients and clinicians in allergy and asthma care. Despite their growing accessibility, systematic evaluations of their adherence to current clinical guidelines, safety, and potential clinical utility in allergology remain limited. The aim of the study was to compare widely accessible AI-driven chatbots in common allergology scenarios.
Method
Standardized clinical vignettes representing common allergic conditions (anaphylaxis, allergic rhinitis, and asthma) were presented to three AI chatbots (ChatGPT 5.0, Microsoft Copilot, Gemini 2.5 Flash) using a fixed prompt. Responses were independently assessed using a predefined 0-2 point scoring system across ten clinically relevant domains, including diagnosis, diagnostics, acute and chronic management, safety considerations, patient education, follow-up, practicality, structure and guideline justification. Critical errors were identified and response times were recorded.
Results
Overall performance across all models was high, with predominantly guideline-consistent and clinically appropriate recommendations. In anaphylaxis scenarios, all chatbots achieved maximal scores (20/20) without critical errors. For allergic rhinitis, ChatGPT and Copilot achieved full scores (20/20), while Gemini scored 19/20 due to incomplete diagnostic evaluation. In asthma scenarios, all chatbots scored 19/20, primarily because SABA reliever therapy diverged slightly from current GINA 2025 recommendations, favoring low-dose ICS formoterol. Response times ranged from 5.56 to 10.07 seconds, with Gemini consistently demonstrating the fastest responses.
Conclusion
ChatGPT 5.0, Microsoft Copilot, and Gemini 2.5 Flash demonstrate high accuracy, safety and overall adherence to international allergy and asthma guidelines in simulated clinical scenarios. Minor guideline discrepancies underscore the need for continuous model updating and clinician oversight. While these tools cannot replace professional decision-making, they may represent valuable adjuncts for patient education, preliminary triage, and clinical decision support in allergology.
