Sorbonne Université, CNRS, Institut de Biologie Paris Seine, Biologie Computationnelle et Quantitative - LCQB, 75005 Paris, France.
PLoS Comput Biol. 2020 Oct 9;16(10):e1007621. doi: 10.1371/journal.pcbi.1007621. eCollection 2020 Oct.
Predicting three-dimensional protein structure and assembling protein complexes using sequence information belongs to the most prominent tasks in computational biology. Recently substantial progress has been obtained in the case of single proteins using a combination of unsupervised coevolutionary sequence analysis with structurally supervised deep learning. While reaching impressive accuracies in predicting residue-residue contacts, deep learning has a number of disadvantages. The need for large structural training sets limits the applicability to multi-protein complexes; and their deep architecture makes the interpretability of the convolutional neural networks intrinsically hard. Here we introduce FilterDCA, a simpler supervised predictor for inter-domain and inter-protein contacts. It is based on the fact that contact maps of proteins show typical contact patterns, which results from secondary structure and are reflected by patterns in coevolutionary analysis. We explicitly integrate averaged contacts patterns with coevolutionary scores derived by Direct Coupling Analysis, improving performance over standard coevolutionary analysis, while remaining fully transparent and interpretable. The FilterDCA code is available at http://gitlab.lcqb.upmc.fr/muscat/FilterDCA.
利用序列信息预测三维蛋白质结构和组装蛋白质复合物属于计算生物学中最突出的任务之一。最近,在使用无监督共进化序列分析与结构监督深度学习相结合的情况下,在单个蛋白质的情况下取得了实质性进展。尽管在预测残基-残基接触方面取得了令人印象深刻的准确性,但深度学习有许多缺点。对大型结构训练集的需求限制了其在多蛋白复合物中的适用性;而且其深层架构使得卷积神经网络的可解释性本质上很困难。在这里,我们引入了 FilterDCA,这是一种用于域间和蛋白质间接触的更简单的监督预测器。它基于这样一个事实,即蛋白质的接触图显示出典型的接触模式,这些模式是由二级结构产生的,并反映在共进化分析中。我们明确地将平均接触模式与直接耦合分析得出的共进化分数结合起来,从而提高了标准共进化分析的性能,同时仍然完全透明和可解释。FilterDCA 代码可在 http://gitlab.lcqb.upmc.fr/muscat/FilterDCA 获得。