TNO, Utrechtseweg 48, Zeist, the Netherlands.
DuPont Nutrition and Biosciences, Palo Alto, CA, 94304, USA.
Regul Toxicol Pharmacol. 2019 Oct;107:104422. doi: 10.1016/j.yrtph.2019.104422. Epub 2019 Jul 13.
Alternative and sustainable protein sources (e.g., algae, duckweed, insects) are required to produce (future) foods. However, introduction of new food sources to the market requires a thorough risk assessment of nutritional, microbial and toxicological risks and potential allergic responses. Yet, the risk assessment of allergenic potential of novel proteins is challenging. Currently, guidance for genetically modified proteins relies on a weight-of-evidence approach. Current Codex (2009) and EFSA (2010; 2017) guidance indicates that sequence identity to known allergens is acceptable for predicting the cross-reactive potential of novel proteins and resistance to pepsin digestion and glycosylation status is used for evaluating de novo allergenicity potential. Other physicochemical and biochemical protein properties, however, are not used in the current weight-of-evidence approach. In this study, we have used the Random Forest algorithm for developing an in silico model that yields a prediction of the allergenic potential of a protein based on its physicochemical and biochemical properties. The final model contains twenty-nine variables, which were all calculated using the protein sequence by means of the ProtParam software and the PSIPred Protein Sequence Analysis program. Proteins were assigned as allergenic when present in the COMPARE database. Results show a robust model performance with a sensitivity, specificity and accuracy each greater than ≥85%. As the model only requires the protein sequence for calculations, it can be easily incorporated into the existing risk assessment approach. In conclusion, the model developed in this study improves the predictability of the allergenicity of new or modified food proteins, as demonstrated for insect proteins.
需要替代和可持续的蛋白质来源(例如藻类、浮萍、昆虫)来生产(未来的)食品。然而,将新的食物来源引入市场需要对营养、微生物和毒理学风险以及潜在的过敏反应进行彻底的风险评估。然而,新型蛋白质的致敏性风险评估具有挑战性。目前,转基因蛋白质的指导原则依赖于基于证据权重的方法。目前的食品法典(2009 年)和欧洲食品安全局(2010 年;2017 年)指南表明,与已知过敏原的序列同一性可用于预测新型蛋白质的交叉反应潜力,而对胃蛋白酶消化的抗性和糖基化状态用于评估新的致敏性潜力。然而,目前的证据权重方法并未使用其他物理化学和生化蛋白质特性。在这项研究中,我们使用随机森林算法开发了一种基于蛋白质理化生化特性预测蛋白质致敏潜力的计算模型。最终模型包含 29 个变量,这些变量都是通过 ProtParam 软件和 PSIPred 蛋白质序列分析程序使用蛋白质序列计算得到的。当蛋白质存在于 COMPARE 数据库中时,将其指定为过敏原。结果表明,该模型具有稳健的性能,其敏感性、特异性和准确性均大于≥85%。由于该模型仅需要蛋白质序列进行计算,因此可以很容易地纳入现有的风险评估方法。总之,本研究开发的模型提高了对新的或改良的食物蛋白质致敏性的预测能力,如对昆虫蛋白质的预测。