D2.323 - Evaluation of Artificial Intelligence Models with Case Scenarios in the Diagnosis and Management of Pediatric Anaphylaxis

Poster abstract

Background

Anaphylaxis is a severe, multisystem hypersensitivity reaction increasingly seen in children. Its diagnosis relies solely on clinical criteria; however, there is a need for tools to support the diagnostic process. This study aims to evaluate artificial intelligence models in diagnosing and managing pediatric anaphylaxis as decision-support tools.

Method

The free versions of ChatGPT-4o (OpenAI) and Gemini 2.5 Flash (Google) were used without requiring a login to evaluate pediatric case scenarios. Each model was presented with nine case scenarios. In the first prompt, the diagnosis, treatment, and follow-up were requested. In the second, if no reference was provided in the previous response, they were asked to provide one. Two pediatric allergy–immunology specialists evaluated the responses regarding diagnosis, treatment, follow-up, and references in anaphylaxis cases, and the accuracy of diagnosis in differential diagnosis scenarios.

Results

ChatGPT-4o correctly diagnosed three of four anaphylaxis cases, while Gemini 2.5 Flash correctly identified all four. When anaphylaxis was correctly diagnosed, both models recommended intramuscular adrenaline as first-line treatment. In confirmed anaphylaxis cases, reference accuracy was 52% (10/19) for ChatGPT-4o and 76% (38/50) for Gemini 2.5 Flash. Both models incorrectly classified a vasovagal syncope case and an acute urticaria with an acute asthma exacerbation as anaphylaxis.

Conclusion

ChatGPT-4o and Gemini 2.5 Flash demonstrate potential in diagnosing and managing pediatric anaphylaxis; however, their performance declines in complex cases. With the advancement of large language models, further studies will be needed to assess their reliability as clinical decision-support tools.