D3.275 - In silico based assessment of intrinsic protein features for ranking clinical relevance of plant-based food allergens
Background
Risk assessment of food allergenicity remains a major regulatory and scientific challenge, particularly for novel foods and newly expressed proteins, where human clinical data are unavailable. EFSA and related frameworks apply weight-of-evidence approaches integrating sequence similarity, physicochemical stability, exposure, serological screening and in silico immunological predictions. However, it is still unclear whether clinical relevance of allergens reflect consistent intrinsic molecular, structural, immunological, or localisation signatures, and thus whether such features can reliably explain or predict clinical relevance. Thus, the aim of present study was to determine if EFSA-defined clinical relevance of food allergens corresponds to consistent intrinsic protein signatures and to evaluate the predictive and mechanistic contributions of physicochemical, structural, MHC class II, and localisation features.
Method
Sixty-five clinically established plant-based food allergens were annotated with intrinsic physicochemical, structural, MHC-II binding, and localisation features, which were analysed as binary and ordinal outcomes using regularised logistic regression with stratified cross-validation, evaluated by ROC-AUC and PR-AUC. Feature effects were assessed by permutation importance, FDR-corrected correlations, and multivariate analyses (PCA, clustering). A quadrant framework contrasted intrinsic mechanistic signatures with allergen clinical relevance, with sensitivity analyses controlling for protein length.
Results
Intrinsic features showed limited but reproducible predictive performance (mean ROC-AUC ≈ 0.6), indicating that intrinsic properties alone are insufficient for accurate allergenicity prediction. Nonetheless, robust mechanistic trends emerged. Features related to protein exposure, hydrophobicity, structural accessibility, and MHC-II binder density consistently contributed across analyses.
PCA identified a dominant axis integrating exposure, physicochemical stability, and immunological presentation potential; however, allergen clinical relevance overlapped substantially in intrinsic space. Clustering and quadrant analyses revealed both mechanistically concordant allergens (high intrinsic signal and high clinical relevance) and numerous clinically relevant outliers, reflecting discordance between intrinsic features and severity relevance. MHC-II binding potential showed weak-to-moderate associations with physicochemical features but did not independently define clinical relevance, instead acting as a modulatory component within exposure-driven contexts.
Conclusion
This study uniquely decouples predictive and mechanistic roles of intrinsic allergen features for clinical relevance, providing a structured framework to identify where molecular and immunological signatures inform risk and where they fail.
