Department of Computer Science, Nagoya Institute of Technology, Nagoya, Aichi, Japan.
Division of Biochemistry, National Institute of Health Sciences, Kawasaki, Kanagawa, Japan.
J Biol Chem. 2023 Jun;299(6):104733. doi: 10.1016/j.jbc.2023.104733. Epub 2023 Apr 21.
Cutting-edge technologies such as genome editing and synthetic biology allow us to produce novel foods and functional proteins. However, their toxicity and allergenicity must be accurately evaluated. It is known that specific amino acid sequences in proteins make some proteins allergic, but many of these sequences remain uncharacterized. In this study, we introduce a data-driven approach and a machine-learning method to find undiscovered allergen-specific patterns (ASPs) among amino acid sequences. The proposed method enables an exhaustive search for amino acid subsequences whose frequencies are statistically significantly higher in allergenic proteins. As a proof-of-concept, we created a database containing 21,154 proteins of which the presence or absence of allergic reactions are already known and applied the proposed method to the database. The detected ASPs in this proof-of-concept study were consistent with known biological findings, and the allergenicity prediction performance using the detected ASPs was higher than extant approaches, indicating this method may be useful in evaluating the utility of synthetic foods and proteins.
前沿技术,如基因组编辑和合成生物学,使我们能够生产新型食品和功能性蛋白质。然而,它们的毒性和过敏性必须得到准确评估。已知蛋白质中的特定氨基酸序列会使某些蛋白质过敏,但这些序列中有许多仍未被描述。在这项研究中,我们引入了一种数据驱动的方法和机器学习方法,以在氨基酸序列中找到未被发现的过敏原特异性模式(ASP)。该方法能够对频率在过敏原蛋白中统计上显著更高的氨基酸子序列进行穷尽搜索。作为概念验证,我们创建了一个包含 21154 个蛋白质的数据库,其中已经知道了过敏反应的存在与否,并将所提出的方法应用于该数据库。在这项概念验证研究中检测到的 ASP 与已知的生物学发现一致,并且使用检测到的 ASP 进行的过敏性预测性能优于现有的方法,表明该方法可能有助于评估合成食品和蛋白质的效用。