D2.281 - Ensemble Immunoinformatic Prediction Reveals Novel Allergen Candidates in Nigella sativa Seeds and Oil
Background
Allergens, typically proteins capable of triggering adverse immune responses, represent a significant public health challenge. Nigella sativa (black cumin) seeds and oil are increasingly consumed worldwide not only as functional food ingredients, but also as phytomedicine. Despite their widespread use, the allergenic potential of Nigella sativa remains poorly characterized, and no entries are currently registered in the WHO/IUIS allergen database.
Method
We curated a dataset of Nigella sativa proteins by merging NCBI and UniProt entries, yielding 167 unique proteins. These were subjected to an ensemble immunoinformatic pipeline, integrating machine learning, deep learning, structure‑based, and hybrid classifiers into a unified voting system. Predictions were fine‑tuned against WHO/IUIS allergens, with candidates selected at >0.5 threshold. In parallel, protein profiling was performed by 1D SDS‑PAGE (14% reducing gel) of seed extracts (whole, cake, and pellet fractions after oil centrifugation), followed by in‑gel digestion and Orbitrap Exploris 240 mass spectrometry.
Results
The ensemble identified 22 candidate allergens with high confidence. Among these, nigelin and thionine proteins emerged as novel candidates, with no counterparts in the WHO/IUIS database. Nigelin showed no homologous protein in UniRef50 and only a single BLAST hit at 56% identity, underscoring its uniqueness. In contrast, thionine‑1 and thionine‑2 each produced >250 BLAST hits, reporting numerous homologous proteins from Triticum vulgare (wheat) with >70% identity, suggesting potential cross‑reactivity with common food sources. Complementary proteomics revealed that in low‑molecular‑weight SDS‑PAGE bands, nigelin and thionine proteins were detected as moderate to richly abundant components, corroborating their allergen candidacy.
Conclusion
This first systematic immunoinformatic assessment of Nigella sativa proteins reveals not only novel allergen candidates but also a novel functional group absent from current registries. Combining manual curation, ensemble prediction, and proteomic validation provides a scalable framework for identifying emerging allergens in under‑represented food and medicinal sources, emphasizing the need to update allergen databases to improve diagnostics and prevention.
