• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用部分聚类检测正确蛋白质折叠的改进方法。

An improved method to detect correct protein folds using partial clustering.

机构信息

JHK Co., Ltd., 2049 Heping Road, Shenzhen, Guangdong 518010, China.

出版信息

BMC Bioinformatics. 2013 Jan 16;14:11. doi: 10.1186/1471-2105-14-11.

DOI:10.1186/1471-2105-14-11
PMID:23323835
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3626854/
Abstract

BACKGROUND

Structure-based clustering is commonly used to identify correct protein folds among candidate folds (also called decoys) generated by protein structure prediction programs. However, traditional clustering methods exhibit a poor runtime performance on large decoy sets. We hypothesized that a more efficient "partial" clustering approach in combination with an improved scoring scheme could significantly improve both the speed and performance of existing candidate selection methods.

RESULTS

We propose a new scheme that performs rapid but incomplete clustering on protein decoys. Our method detects structurally similar decoys (measured using either C(α) RMSD or GDT-TS score) and extracts representatives from them without assigning every decoy to a cluster. We integrated our new clustering strategy with several different scoring functions to assess both the performance and speed in identifying correct or near-correct folds. Experimental results on 35 Rosetta decoy sets and 40 I-TASSER decoy sets show that our method can improve the correct fold detection rate as assessed by two different quality criteria. This improvement is significantly better than two recently published clustering methods, Durandal and Calibur-lite. Speed and efficiency testing shows that our method can handle much larger decoy sets and is up to 22 times faster than Durandal and Calibur-lite.

CONCLUSIONS

The new method, named HS-Forest, avoids the computationally expensive task of clustering every decoy, yet still allows superior correct-fold selection. Its improved speed, efficiency and decoy-selection performance should enable structure prediction researchers to work with larger decoy sets and significantly improve their ab initio structure prediction performance.

摘要

背景

基于结构的聚类通常用于在蛋白质结构预测程序生成的候选折叠(也称为诱饵)中识别正确的蛋白质折叠。然而,传统的聚类方法在大型诱饵集上表现出较差的运行时性能。我们假设更有效的“部分”聚类方法与改进的评分方案相结合,可以显著提高现有候选选择方法的速度和性能。

结果

我们提出了一种新的方案,对蛋白质诱饵进行快速但不完全的聚类。我们的方法检测结构相似的诱饵(使用 C(α)RMSD 或 GDT-TS 得分测量),并从它们中提取代表,而无需将每个诱饵分配到一个聚类中。我们将我们的新聚类策略与几种不同的评分函数集成在一起,以评估识别正确或近似正确折叠的性能和速度。在 35 个 Rosetta 诱饵集和 40 个 I-TASSER 诱饵集上的实验结果表明,我们的方法可以提高两种不同质量标准评估的正确折叠检测率。与两种最近发表的聚类方法 Durandal 和 Calibur-lite 相比,这种改进要好得多。速度和效率测试表明,我们的方法可以处理更大的诱饵集,并且比 Durandal 和 Calibur-lite 快 22 倍。

结论

新方法名为 HS-Forest,避免了对每个诱饵进行聚类的计算密集型任务,但仍允许进行更好的正确折叠选择。其改进的速度、效率和诱饵选择性能应该使结构预测研究人员能够处理更大的诱饵集,并显著提高他们从头预测结构的性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e0f3/3626854/dd5598b6bd4f/1471-2105-14-11-8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e0f3/3626854/925bd8ea8663/1471-2105-14-11-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e0f3/3626854/34f519411624/1471-2105-14-11-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e0f3/3626854/041b79a46b77/1471-2105-14-11-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e0f3/3626854/9c7878ec1270/1471-2105-14-11-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e0f3/3626854/5c8807304bb5/1471-2105-14-11-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e0f3/3626854/60f8d3947328/1471-2105-14-11-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e0f3/3626854/0a11845af535/1471-2105-14-11-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e0f3/3626854/dd5598b6bd4f/1471-2105-14-11-8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e0f3/3626854/925bd8ea8663/1471-2105-14-11-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e0f3/3626854/34f519411624/1471-2105-14-11-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e0f3/3626854/041b79a46b77/1471-2105-14-11-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e0f3/3626854/9c7878ec1270/1471-2105-14-11-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e0f3/3626854/5c8807304bb5/1471-2105-14-11-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e0f3/3626854/60f8d3947328/1471-2105-14-11-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e0f3/3626854/0a11845af535/1471-2105-14-11-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e0f3/3626854/dd5598b6bd4f/1471-2105-14-11-8.jpg

相似文献

1
An improved method to detect correct protein folds using partial clustering.使用部分聚类检测正确蛋白质折叠的改进方法。
BMC Bioinformatics. 2013 Jan 16;14:11. doi: 10.1186/1471-2105-14-11.
2
Entropy-accelerated exact clustering of protein decoys.熵加速的蛋白质诱饵精确聚类。
Bioinformatics. 2011 Apr 1;27(7):939-45. doi: 10.1093/bioinformatics/btr072. Epub 2011 Feb 9.
3
Calibur: a tool for clustering large numbers of protein decoys.Calibur:一种用于对大量蛋白质诱饵进行聚类的工具。
BMC Bioinformatics. 2010 Jan 13;11:25. doi: 10.1186/1471-2105-11-25.
4
Durandal: fast exact clustering of protein decoys.Durandal:快速精确的蛋白质诱饵聚类。
J Comput Chem. 2012 Feb 5;33(4):471-4. doi: 10.1002/jcc.21988. Epub 2011 Nov 26.
5
Improved protein structure selection using decoy-dependent discriminatory functions.使用诱饵依赖型判别函数改进蛋白质结构选择
BMC Struct Biol. 2004 Jun 18;4:8. doi: 10.1186/1472-6807-4-8.
6
Clustering 100,000 protein structure decoys in minutes.在数分钟内对 10 万个蛋白质结构 decoys 进行聚类。
IEEE/ACM Trans Comput Biol Bioinform. 2012 May-Jun;9(3):765-73. doi: 10.1109/TCBB.2011.142.
7
Decoy selection for protein structure prediction via extreme gradient boosting and ranking.通过极端梯度提升和排序选择蛋白质结构预测的诱饵。
BMC Bioinformatics. 2020 Dec 9;21(Suppl 1):189. doi: 10.1186/s12859-020-3523-9.
8
SPICKER: a clustering approach to identify near-native protein folds.SPICKER:一种用于识别接近天然蛋白质折叠结构的聚类方法。
J Comput Chem. 2004 Apr 30;25(6):865-71. doi: 10.1002/jcc.20011.
9
Ranking near-native candidate protein structures via random forest classification.基于随机森林分类的近天然候选蛋白结构排序。
BMC Bioinformatics. 2019 Dec 24;20(Suppl 25):683. doi: 10.1186/s12859-019-3257-8.
10
Improved Protein Decoy Selection via Non-Negative Matrix Factorization.通过非负矩阵分解改进蛋白质诱饵选择。
IEEE/ACM Trans Comput Biol Bioinform. 2022 May-Jun;19(3):1670-1682. doi: 10.1109/TCBB.2020.3049088. Epub 2022 Jun 3.

引用本文的文献

1
Clust: automatic extraction of optimal co-expressed gene clusters from gene expression data.Clust:从基因表达数据中自动提取最优共表达基因簇。
Genome Biol. 2018 Oct 25;19(1):172. doi: 10.1186/s13059-018-1536-8.
2
From Extraction of Local Structures of Protein Energy Landscapes to Improved Decoy Selection in Template-Free Protein Structure Prediction.从蛋白质能量景观的局部结构提取到无模板蛋白质结构预测中的诱饵选择改进。
Molecules. 2018 Jan 19;23(1):216. doi: 10.3390/molecules23010216.
3
Pharmacophore-based virtual screening of catechol-o-methyltransferase (COMT) inhibitors to combat Alzheimer's disease.

本文引用的文献

1
Entropy-accelerated exact clustering of protein decoys.熵加速的蛋白质诱饵精确聚类。
Bioinformatics. 2011 Apr 1;27(7):939-45. doi: 10.1093/bioinformatics/btr072. Epub 2011 Feb 9.
2
Finding the nearest neighbors in biological databases using less distance computations.使用较少的距离计算在生物数据库中查找最近邻。
IEEE/ACM Trans Comput Biol Bioinform. 2010 Oct-Dec;7(4):669-80. doi: 10.1109/TCBB.2008.99.
3
Calibur: a tool for clustering large numbers of protein decoys.Calibur:一种用于对大量蛋白质诱饵进行聚类的工具。
基于药效团的儿茶酚-O-甲基转移酶(COMT)抑制剂虚拟筛选以对抗阿尔茨海默病。
J Biomol Struct Dyn. 2018 Nov;36(15):3938-3957. doi: 10.1080/07391102.2017.1404931. Epub 2017 Dec 27.
4
Identify High-Quality Protein Structural Models by Enhanced -Means.通过增强均值识别高质量蛋白质结构模型。
Biomed Res Int. 2017;2017:7294519. doi: 10.1155/2017/7294519. Epub 2017 Mar 22.
BMC Bioinformatics. 2010 Jan 13;11:25. doi: 10.1186/1471-2105-11-25.
4
Fast determination of the optimal rotational matrix for macromolecular superpositions.快速确定大分子叠加的最佳旋转矩阵。
J Comput Chem. 2010 May;31(7):1561-3. doi: 10.1002/jcc.21439.
5
Selecting high quality protein structures from diverse conformational ensembles.从多样的构象集合中选择高质量蛋白质结构。
Biophys J. 2009 Sep 16;97(6):1728-36. doi: 10.1016/j.bpj.2009.06.046.
6
Prediction of global and local quality of CASP8 models by MULTICOM series.MULTICOM 系列预测 CASP8 模型的全局和局部质量。
Proteins. 2009;77 Suppl 9:181-4. doi: 10.1002/prot.22487.
7
Improving consensus structure by eliminating averaging artifacts.通过消除平均伪影来改善共识结构。
BMC Struct Biol. 2009 Mar 6;9:12. doi: 10.1186/1472-6807-9-12.
8
CS23D: a web server for rapid protein structure generation using NMR chemical shifts and sequence data.CS23D:一个利用核磁共振化学位移和序列数据快速生成蛋白质结构的网络服务器。
Nucleic Acids Res. 2008 Jul 1;36(Web Server issue):W496-502. doi: 10.1093/nar/gkn305. Epub 2008 May 30.
9
Macromolecular modeling with rosetta.使用Rosetta进行大分子建模。
Annu Rev Biochem. 2008;77:363-82. doi: 10.1146/annurev.biochem.77.062906.171838.
10
Ab initio modeling of small proteins by iterative TASSER simulations.通过迭代TASSER模拟对小蛋白质进行从头建模。
BMC Biol. 2007 May 8;5:17. doi: 10.1186/1741-7007-5-17.