D1.67 - Comparison of machine learning models for grass carp allergy diagnosis

Poster abstract

Background

Fish is a common food allergen affecting 0.2-2.3% of the population globally with grass carp being one of the major culprit species in Hong Kong. The current diagnosis relies on food challenge which poses a risk of anaphylaxis. While the incorporation of machine learning allows the development of novel diagnostic models, the lack of diversity in modelling parameters including variable selection methods, modelling algorithm, predictor sets remain to be solved. In view of this, this study aims to probe into the influence of various modelling parameters on the performance metrics of a grass carp allergy prediction model.

Method

The testing data set comprises of 100 subjects whom completed grass carp food. Upon data cleaning, 19 diagnostic tests were included in the model building. Modelling parameters in this study include (i) Variable selection methods (LASSO, Stepwise selection, Both) (ii) Modelling algorithm (KNN, RF, GB, XGB, MLP, SVM) (iii) Model complexity (number of predictors). Selection of the final model is based on achieving a score of more than 0.6 in six performance metrics, namely (i) sensitivity (ii) specificity (iii) positive predictive value (iv) negative predictive value (v) accuracy (vi) AUC

Results

A significant but weak correlations were found between the number of predictors and specificity (r=0.381, p <0.001), positive predictive value (r=-0.151, p <0.05), negative predictive value (r=0.355, p <0.001), AUC (r=0.233, p<0.01). Furthermore, the incorporation of both LASSO and stepwise variable selection methods achieved the best performance followed by LASSO alone and stepwise selection. Out of the six modelling algorithms being evaluated, KNN, RF, GB were ranked equally as top modelling algorithms followed by XGB, MLP, SVM. The final prediction model combining specific IgE measurement against Cten i 1 (grass carp parvalbumin) and basophil activation test against grass carp extract generated 85% predictive accuracy.

Conclusion

KNN/RF/GB algorithms with LASSO and stepwise variable selection generate better model performance. A diagnostic model comprising component test and basophil activation test to aid better diagnosis of grass carp allergy. [This study was supported by the Health and Medical Research Fund (08191356 & 12230466).]