Johansson-Åkhe Isak, Mirabello Claudio, Wallner Björn
Division of Bioinformatics, Department of Physics, Chemistry and Biology, Linköping University, Linköping, Sweden.
Front Bioinform. 2021 Oct 25;1:763102. doi: 10.3389/fbinf.2021.763102. eCollection 2021.
Peptide-protein interactions between a smaller or disordered peptide stretch and a folded receptor make up a large part of all protein-protein interactions. A common approach for modeling such interactions is to exhaustively sample the conformational space by fast-Fourier-transform docking, and then refine a top percentage of decoys. Commonly, methods capable of ranking the decoys for selection fast enough for larger scale studies rely on first-principle energy terms such as electrostatics, Van der Waals forces, or on pre-calculated statistical potentials. We present InterPepRank for peptide-protein complex scoring and ranking. InterPepRank is a machine learning-based method which encodes the structure of the complex as a graph; with physical pairwise interactions as edges and evolutionary and sequence features as nodes. The graph network is trained to predict the LRMSD of decoys by using edge-conditioned graph convolutions on a large set of peptide-protein complex decoys. InterPepRank is tested on a massive independent test set with no targets sharing CATH annotation nor 30% sequence identity with any target in training or validation data. On this set, InterPepRank has a median AUC of 0.86 for finding coarse peptide-protein complexes with LRMSD < 4Å. This is an improvement compared to other state-of-the-art ranking methods that have a median AUC between 0.65 and 0.79. When included as a selection-method for selecting decoys for refinement in a previously established peptide docking pipeline, InterPepRank improves the number of medium and high quality models produced by 80% and 40%, respectively. The InterPepRank program as well as all scripts for reproducing and retraining it are available from: .
较小的或无序的肽段与折叠的受体之间的肽 - 蛋白质相互作用构成了所有蛋白质 - 蛋白质相互作用的很大一部分。对这类相互作用进行建模的一种常用方法是通过快速傅里叶变换对接详尽地采样构象空间,然后对前百分之几的诱饵进行优化。通常,能够快速对诱饵进行排序以用于大规模研究的方法依赖于诸如静电、范德华力等第一性原理能量项,或者依赖于预先计算的统计势。我们提出了用于肽 - 蛋白质复合物评分和排序的InterPepRank。InterPepRank是一种基于机器学习的方法,它将复合物的结构编码为一个图;以物理成对相互作用为边,进化和序列特征为节点。通过在大量肽 - 蛋白质复合物诱饵集上使用基于边条件的图卷积,对图网络进行训练以预测诱饵的LRMSD。InterPepRank在一个大规模独立测试集上进行测试,该测试集中没有与训练或验证数据中的任何目标共享CATH注释或30%序列同一性的目标。在这个数据集上,对于找到LRMSD < 4Å的粗肽 - 蛋白质复合物,InterPepRank的中位数AUC为0.86。与其他中位数AUC在0.65至0.79之间的现有先进排序方法相比,这是一个改进。当作为在先前建立的肽对接流程中选择用于优化的诱饵的选择方法时,InterPepRank分别将中高质量模型的数量提高了80%和40%。InterPepRank程序以及用于重现和重新训练它的所有脚本可从以下网址获取: 。