D2.548 - Expert Evaluation of Dialogue Quality and Clinical Reasoning in a Hybrid Multilingual LLM–Algorithm Allergy Decision-Support System: A Pilot Study of 34 Real-World Cases

Poster abstract

Background

Large language models (LLMs) are increasingly integrated into clinical decision-support systems. In allergy practice, accurate diagnosis depends heavily on structured and comprehensive history-taking. However, most AI evaluations focus on diagnostic accuracy rather than the quality of the clinical dialogue and reasoning process. To assess dialogue quality, clinical reasoning concordance, laboratory test selection logic, and safety handling in AllergoChat, a hybrid multilingual LLM–algorithm allergy decision-support system.

Method

We conducted a retrospective expert evaluation of 34 anonymized real-world allergy dialogue cases (68 independent expert assessments). Three board-certified allergists (two practicing in Kazakhstan and one in Ukraine) independently assessed dialogue completeness, question relevance, dialogue logic, expert-defined phenotype, concordance with system conclusions, appropriateness of test strategy, and safety performance. Dialogue completeness was summarized as a percentage score based on predefined key allergy history domains. All analyzed cases were Russian-language dialogues.

Results

Mean completeness of allergy-focused history-taking was 95.1% ± 6.7. Question relevance and dialogue logic were rated highly (4.59 ± 0.81 and 4.97 ± 0.24, respectively). System conclusions were rated as correct in 97.06% of expert assessments, with no incorrect classifications. Test strategy was considered optimal in 82.35% of cases and overly broad in 17.65%. No safety-critical situations were missed.

Conclusion

In this pilot expert validation, AllergoChat demonstrated high-quality structured dialogue, strong concordance with allergist-defined phenotypes, and robust safety performance. These findings support further prospective evaluation of clinically structured hybrid AI decision-support systems in allergy care.