Grønning Alexander G B, Kacprowski Tim, Schéele Camilla
Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, 2200 Copenhagen, Denmark.
Division Data Science in Biomedicine, Peter L. Reichertz Institute for Medical Informatics, TU Braunschweig and Hannover Medical School, 38106 Braunschweig, Germany.
Biol Methods Protoc. 2021 Nov 23;6(1):bpab021. doi: 10.1093/biomethods/bpab021. eCollection 2021.
Peptide-based therapeutics are here to stay and will prosper in the future. A key step in identifying novel peptide-drugs is the determination of their bioactivities. Recent advances in peptidomics screening approaches hold promise as a strategy for identifying novel drug targets. However, these screenings typically generate an immense number of peptides and tools for ranking these peptides prior to planning functional studies are warranted. Whereas a couple of tools in the literature predict multiple classes, these are constructed using multiple binary classifiers. We here aimed to use an innovative deep learning approach to generate an improved peptide bioactivity classifier with capacity of distinguishing between multiple classes. We present MultiPep: a deep learning multi-label classifier that assigns peptides to zero or more of 20 bioactivity classes. We train and test MultiPep on data from several publically available databases. The same data are used for a hierarchical clustering, whose dendrogram shapes the architecture of MultiPep. We test a new loss function that combines a customized version of Matthews correlation coefficient with binary cross entropy (BCE), and show that this is better than using class-weighted BCE as loss function. Further, we show that MultiPep surpasses state-of-the-art peptide bioactivity classifiers and that it predicts known and novel bioactivities of FDA-approved therapeutic peptides. In conclusion, we present innovative machine learning techniques used to produce a peptide prediction tool to aid peptide-based therapy development and hypothesis generation.
基于肽的疗法已成为现实,并将在未来蓬勃发展。确定新型肽类药物的一个关键步骤是测定其生物活性。肽组学筛选方法的最新进展有望成为识别新型药物靶点的一种策略。然而,这些筛选通常会产生大量的肽,因此在规划功能研究之前,需要有对这些肽进行排序的工具。虽然文献中的一些工具可以预测多个类别,但这些工具是使用多个二元分类器构建的。我们的目标是使用一种创新的深度学习方法来生成一个改进的肽生物活性分类器,该分类器能够区分多个类别。我们提出了MultiPep:一种深度学习多标签分类器,它可以将肽分配到20个生物活性类别中的零个或多个类别。我们使用来自几个公开可用数据库的数据对MultiPep进行训练和测试。相同的数据用于层次聚类,其树状图塑造了MultiPep的架构。我们测试了一种新的损失函数,该函数将定制版的马修斯相关系数与二元交叉熵(BCE)相结合,并表明这比使用类别加权BCE作为损失函数更好。此外,我们表明MultiPep超越了现有的肽生物活性分类器,并且它能够预测FDA批准的治疗性肽已知和新的生物活性。总之,我们展示了用于生产肽预测工具的创新机器学习技术,以帮助基于肽疗法的开发和假设生成。