Department of Computer Science, University College London, Gower Street, London, WC1E 6BT, UK.
The Francis Crick Institute, 1 Midland Road, London, NW1 1AT, UK.
Nat Commun. 2019 Sep 4;10(1):3977. doi: 10.1038/s41467-019-11994-0.
The inapplicability of amino acid covariation methods to small protein families has limited their use for structural annotation of whole genomes. Recently, deep learning has shown promise in allowing accurate residue-residue contact prediction even for shallow sequence alignments. Here we introduce DMPfold, which uses deep learning to predict inter-atomic distance bounds, the main chain hydrogen bond network, and torsion angles, which it uses to build models in an iterative fashion. DMPfold produces more accurate models than two popular methods for a test set of CASP12 domains, and works just as well for transmembrane proteins. Applied to all Pfam domains without known structures, confident models for 25% of these so-called dark families were produced in under a week on a small 200 core cluster. DMPfold provides models for 16% of human proteome UniProt entries without structures, generates accurate models with fewer than 100 sequences in some cases, and is freely available.
氨基酸共变方法在小蛋白家族中的不适用性限制了它们在整个基因组结构注释中的应用。最近,深度学习在允许进行准确的残基-残基接触预测方面显示出了潜力,即使是在浅层序列比对的情况下。在这里,我们介绍了 DMPfold,它使用深度学习来预测原子间距离边界、主链氢键网络和扭转角,然后它可以使用这些信息以迭代的方式构建模型。在 CASP12 结构域的测试集中,DMPfold 产生的模型比两种流行的方法更准确,并且对跨膜蛋白也同样有效。应用于所有没有已知结构的 Pfam 结构域,在一个小型的 200 核集群上,在不到一周的时间内,就可以为 25%的所谓暗家族生成有信心的模型。DMPfold 为 16%没有结构的人类蛋白质组 UniProt 条目提供模型,在某些情况下,即使只有不到 100 个序列,也能生成准确的模型,而且它是免费提供的。