D2.517 - #berlinbreathing : An app-based citizen science project for pollen allergies
Background
Clinical studies showed that an e-diary-based forecast of seasonal allergic rhinitis (SAR) symptoms is possible, among patients with high reporting adherence. It is, however, unclear whether symptom prediction is also possible in an unselected, unguided population. Objective: To train and test a forecasting model for pollen allergy symptoms based on e-diary data from a citizen science cohort without medical or scientific supervision.
Method
In March 2025, a press release of Charité Universitätsmedizin – Berlin invited citizens in Berlin (Germany) to support research on pollen allergies by donating their daily symptom data via the app Pollenius (TPS, Rome, Italy). The app provides a daily symptom diary including a visual analogue scale (VAS 1-100) for overall allergy symptom severity and the indication of the most affected organ. Medication use and time spent outdoors complete the questionnaire. Data was donated anonymously without communication with the study team.
The interpolated e-diary data, together with meteorological variables, and daily pollen counts for birch and grasses were included in the forecasting model. A fraction of the participants who recorded at least 5 days in the e-diary (75%) was used to train an XGboost regression model predicting SAR-related VAS for the next day. Performance of the prediction model was estimated via 5-fold cross-validation.
Results
During the pollen season 2025, 6.335 participants joined the Citizen Science project, recording 132.657 individual days. Of these donors, 2.614 recorded e-diary data for at least 5 days (96.393 days; 72.584 days in training set). On average, participants recorded symptoms during 47.2 (SD 54.0) days (first to last day of reporting) and provided self-reports on 60% (SD 28%) of the days. By day 8 and day ~50, 50% and 25% of participants were still active. Average symptom severity (VAS) was 33. 8 (average per participant SD 19.8). Model performance was: 0.71 R2 and 14.7 RMSE, with lagged symptom reports having the highest feature importance. Overall, the model performed slightly better than a naïve persistence model.
Conclusion
While half of the users lost interest in the app in the first 8 days, those passing this threshold had a high adherence despite being unsupervised. Symptom prediction is also possible in this unselected cohort. However, the performance of the prediction model was only slightly better than the naïve prediction and heavily relied on lagged symptom reports.
