Department of Computer Science, George Mason University, Fairfax, VA 22030, USA.
Center for Advancing Human-Machine Partnerships, George Mason University, Fairfax, VA 22030, USA.
Molecules. 2020 May 9;25(9):2228. doi: 10.3390/molecules25092228.
Controlling the quality of tertiary structures computed for a protein molecule remains a central challenge in de-novo protein structure prediction. The rule of thumb is to generate as many structures as can be afforded, effectively acknowledging that having more structures increases the likelihood that some will reside near the sought biologically-active structure. A major drawback with this approach is that computing a large number of structures imposes time and space costs. In this paper, we propose a novel clustering-based approach which we demonstrate to significantly reduce an ensemble of generated structures without sacrificing quality. Evaluations are related on both benchmark and CASP target proteins. Structure ensembles subjected to the proposed approach and the source code of the proposed approach are publicly-available at the links provided in Section 1.
控制为蛋白质分子计算的三级结构的质量仍然是从头预测蛋白质结构的核心挑战。经验法则是生成尽可能多的结构,这实际上承认生成更多的结构会增加某些结构接近所寻求的生物活性结构的可能性。这种方法的一个主要缺点是计算大量的结构会带来时间和空间成本。在本文中,我们提出了一种新的基于聚类的方法,我们证明该方法可以在不牺牲质量的情况下显著减少生成结构的集合。评估与基准和 CASP 目标蛋白质有关。在第 1 节中提供的链接处可公开获得经过所提出的方法处理的结构集合和所提出的方法的源代码。