D3.30 - Automated generation of structured clinical databases in drug hypersensitivity: real-world validation in patients with proton pump inhibitor reactions

Poster abstract

Background

Retrospective clinical research in drug hypersensitivity largely depends on information contained in routine medical reports. However, the unstructured and heterogeneous nature of these narratives represents a major obstacle for the construction of high-quality clinical databases, limiting scalability, reproducibility and data reliability. Artificial intelligence–based methodologies for automated data structuring may overcome these limitations, but require rigorous validation in real-world clinical settings.

Method

We conducted a real-world validation study to assess the performance of an artificial intelligence–based, data-driven research framework (Latrikos) for the automated extraction and structuring of clinical variables from unstructured medical reports in patients with suspected hypersensitivity to proton pump inhibitors (PPIs).

The cohort included consecutive patients evaluated between 2018 and 2025 at a tertiary referral Allergy Unit (Hospital Universitario Ramón y Cajal, Madrid, Spain).

The methodology integrates domain-adapted natural language processing models trained on Allergy-specific clinical data and terminology, a predefined ontology-based variable structure, and a privacy-by-design data processing workflow independent of hospital information systems.

A total of 172 patients were included. Seventy-seven variables per patient were defined a priori, including 25 key variables related to diagnostic procedures and cross-reactivity assessment (skin tests, drug provocation tests and in vitro assays). The resulting dataset comprised 13,244 individual data points. All automatically extracted variables were manually reviewed by experienced allergists, and discrepancies were classified as extraction errors.

Results

A total of 138 errors were identified among the 13,244 extracted data points, corresponding to an overall accuracy of 98.96%. For key diagnostic variables, 27 errors were observed among 4,300 data points, yielding an accuracy of 99.37%. Most discrepancies were related to semantic ambiguity in the original clinical narratives or to limitations in the initial variable definitions, particularly for symptom classification and reaction timing. Importantly, the entire structured database was generated and validated within a single working day.

Conclusion

In this real-world cohort of patients with suspected PPI hypersensitivity, the proposed artificial intelligence–based methodology achieved very high accuracy for the extraction and structuring of clinically relevant variables from unstructured medical reports. These findings support the feasibility of scalable, reproducible and high-quality database construction directly from routine clinical documentation, even in non-standardized retrospective settings. Such approaches may substantially reduce manual workload and enable large-scale observational research in drug allergy.