D3.431 - Can Artificial Intelligence-Based Large Language Models Provide Accurate and Reliable Information to Asthma Patients? A Comparative Analysis with Expert Insights

Poster abstract

Background

Providing patients with accurate and reliable information significantly enhances their quality of life while reducing the burden on healthcare services. Advancements in artificial intelligence (AI) technology have the potential to support patients in better understanding and managing their health conditions. AI-Based Large Language Models (LLMs) can provide rapid and accurate information about symptoms and general health topics. This study compares Asthma - Patient Information Leaflet generated by LLM with those provided by the Turkish National Allergy and Clinical Immunology Society (TNSACI), as evaluated by expert physicians. The study aims to provide guidance for enhancing existing educational materials.

Method

Physicians with at least five years of experience in Allergy and Immunology were asked to assess blinded versions of two texts: one generated by ChatGPT-4.0 and the other sourced from TNSACI's Asthma - Patient Information Leaflet. Participants evaluated the texts using a Likert scale, assessing accuracy, comprehensibility, level of detail, consistency, reliability, and overall satisfaction. Additionally, readability assessments were conducted utilizing the Flesch-Kincaid formulas.

Results

A total of 21 physicians participated, with a mean age of 38.4 ± 4.9 years and an average professional experience of 6.6 ± 2.7 years. The ChatGPT text comprised 973 words and had a readability score of 56.3 (10th to 12th grade), while the TNSACI text contained 1,603 words with a readability score of 48.5 (College level). Likert scale evaluations revealed that both texts were similar in accuracy, consistency, and reliability; however, the ChatGPT text scored significantly higher in comprehensibility (p=0.03). Although both texts were considered sufficiently detailed, the TNSACI text received significantly higher ratings for being excessively detailed compared to the ChatGPT text  (p=0.01). The responses to the Likert scale questions are presented in Table 1. In response to the question "Which text do you prefer overall," 57.1% of participants (n=12) preferred the ChatGPT text, 4.8% (n=1) preferred the TNSACI text, and 38.1% (n=8) indicated a preference for both (Figure 1).

Conclusion

Expert physicians found the asthma patient information text generated by ChatGPT more comprehensible and preferable compared to the TNSACI text. Although the longer length of the TNSACI text may have introduced bias, the study provides valuable insights for updating existing educational materials.