Suppr超能文献

使用部分聚类检测正确蛋白质折叠的改进方法。

An improved method to detect correct protein folds using partial clustering.

机构信息

JHK Co., Ltd., 2049 Heping Road, Shenzhen, Guangdong 518010, China.

出版信息

BMC Bioinformatics. 2013 Jan 16;14:11. doi: 10.1186/1471-2105-14-11.

Abstract

BACKGROUND

Structure-based clustering is commonly used to identify correct protein folds among candidate folds (also called decoys) generated by protein structure prediction programs. However, traditional clustering methods exhibit a poor runtime performance on large decoy sets. We hypothesized that a more efficient "partial" clustering approach in combination with an improved scoring scheme could significantly improve both the speed and performance of existing candidate selection methods.

RESULTS

We propose a new scheme that performs rapid but incomplete clustering on protein decoys. Our method detects structurally similar decoys (measured using either C(α) RMSD or GDT-TS score) and extracts representatives from them without assigning every decoy to a cluster. We integrated our new clustering strategy with several different scoring functions to assess both the performance and speed in identifying correct or near-correct folds. Experimental results on 35 Rosetta decoy sets and 40 I-TASSER decoy sets show that our method can improve the correct fold detection rate as assessed by two different quality criteria. This improvement is significantly better than two recently published clustering methods, Durandal and Calibur-lite. Speed and efficiency testing shows that our method can handle much larger decoy sets and is up to 22 times faster than Durandal and Calibur-lite.

CONCLUSIONS

The new method, named HS-Forest, avoids the computationally expensive task of clustering every decoy, yet still allows superior correct-fold selection. Its improved speed, efficiency and decoy-selection performance should enable structure prediction researchers to work with larger decoy sets and significantly improve their ab initio structure prediction performance.

摘要

背景

基于结构的聚类通常用于在蛋白质结构预测程序生成的候选折叠(也称为诱饵)中识别正确的蛋白质折叠。然而,传统的聚类方法在大型诱饵集上表现出较差的运行时性能。我们假设更有效的“部分”聚类方法与改进的评分方案相结合,可以显著提高现有候选选择方法的速度和性能。

结果

我们提出了一种新的方案,对蛋白质诱饵进行快速但不完全的聚类。我们的方法检测结构相似的诱饵(使用 C(α)RMSD 或 GDT-TS 得分测量),并从它们中提取代表,而无需将每个诱饵分配到一个聚类中。我们将我们的新聚类策略与几种不同的评分函数集成在一起,以评估识别正确或近似正确折叠的性能和速度。在 35 个 Rosetta 诱饵集和 40 个 I-TASSER 诱饵集上的实验结果表明,我们的方法可以提高两种不同质量标准评估的正确折叠检测率。与两种最近发表的聚类方法 Durandal 和 Calibur-lite 相比,这种改进要好得多。速度和效率测试表明,我们的方法可以处理更大的诱饵集,并且比 Durandal 和 Calibur-lite 快 22 倍。

结论

新方法名为 HS-Forest,避免了对每个诱饵进行聚类的计算密集型任务,但仍允许进行更好的正确折叠选择。其改进的速度、效率和诱饵选择性能应该使结构预测研究人员能够处理更大的诱饵集,并显著提高他们从头预测结构的性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e0f3/3626854/925bd8ea8663/1471-2105-14-11-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验