Department of Statistics, University of Oxford, Oxford OX1 3LB, UK.
Department of Informatics, UCB Pharma, Slough SL1 3WE, UK.
Bioinformatics. 2018 Apr 1;34(7):1132-1140. doi: 10.1093/bioinformatics/btx722.
Most current de novo structure prediction methods randomly sample protein conformations and thus require large amounts of computational resource. Here, we consider a sequential sampling strategy, building on ideas from recent experimental work which shows that many proteins fold cotranslationally.
We have investigated whether a pseudo-greedy search approach, which begins sequentially from one of the termini, can improve the performance and accuracy of de novo protein structure prediction. We observed that our sequential approach converges when fewer than 20 000 decoys have been produced, fewer than commonly expected. Using our software, SAINT2, we also compared the run time and quality of models produced in a sequential fashion against a standard, non-sequential approach. Sequential prediction produces an individual decoy 1.5-2.5 times faster than non-sequential prediction. When considering the quality of the best model, sequential prediction led to a better model being produced for 31 out of 41 soluble protein validation cases and for 18 out of 24 transmembrane protein cases. Correct models (TM-Score > 0.5) were produced for 29 of these cases by the sequential mode and for only 22 by the non-sequential mode. Our comparison reveals that a sequential search strategy can be used to drastically reduce computational time of de novo protein structure prediction and improve accuracy.
Data are available for download from: http://opig.stats.ox.ac.uk/resources. SAINT2 is available for download from: https://github.com/sauloho/SAINT2.
saulo.deoliveira@dtc.ox.ac.uk.
Supplementary data are available at Bioinformatics online.
大多数当前的从头预测方法都是随机采样蛋白质构象,因此需要大量的计算资源。在这里,我们考虑一种顺序采样策略,该策略基于最近的实验工作的思想,该实验表明许多蛋白质是共翻译折叠的。
我们已经研究了一种伪贪婪搜索方法,从其中一个末端开始顺序进行,是否可以提高从头预测蛋白质结构的性能和准确性。我们观察到,当生成的诱饵少于 20000 个时,我们的顺序方法就会收敛,生成的诱饵少于通常预期的数量。使用我们的软件 SAINT2,我们还比较了顺序和非顺序方法生成模型的运行时间和质量。顺序预测比非顺序预测生成单个诱饵快 1.5-2.5 倍。在考虑最佳模型的质量时,顺序预测导致 41 个可溶性蛋白验证案例中有 31 个产生了更好的模型,24 个跨膜蛋白案例中有 18 个产生了更好的模型。顺序模式产生了 29 个此类案例的正确模型(TM-Score > 0.5),而非顺序模式仅产生了 22 个。我们的比较表明,顺序搜索策略可用于大大减少从头预测蛋白质结构的计算时间并提高准确性。
数据可从以下网址下载:http://opig.stats.ox.ac.uk/resources。SAINT2 可从以下网址下载:https://github.com/sauloho/SAINT2。
saulo.deoliveira@dtc.ox.ac.uk。
补充数据可在 Bioinformatics 在线获得。