• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

Calibur:一种用于对大量蛋白质诱饵进行聚类的工具。

Calibur: a tool for clustering large numbers of protein decoys.

机构信息

David R, Cheriton School of Computer Science, University of Waterloo, Waterloo, ON N2L 3G1, Canada.

出版信息

BMC Bioinformatics. 2010 Jan 13;11:25. doi: 10.1186/1471-2105-11-25.

DOI:10.1186/1471-2105-11-25
PMID:20070892
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2881085/
Abstract

BACKGROUND

Ab initio protein structure prediction methods generate numerous structural candidates, which are referred to as decoys. The decoy with the most number of neighbors of up to a threshold distance is typically identified as the most representative decoy. However, the clustering of decoys needed for this criterion involves computations with runtimes that are at best quadratic in the number of decoys. As a result currently there is no tool that is designed to exactly cluster very large numbers of decoys, thus creating a bottleneck in the analysis.

RESULTS

Using three strategies aimed at enhancing performance (proximate decoys organization, preliminary screening via lower and upper bounds, outliers filtering) we designed and implemented a software tool for clustering decoys called Calibur. We show empirical results indicating the effectiveness of each of the strategies employed. The strategies are further fine-tuned according to their effectiveness.Calibur demonstrated the ability to scale well with respect to increases in the number of decoys. For a sample size of approximately 30 thousand decoys, Calibur completed the analysis in one third of the time required when the strategies are not used.For practical use Calibur is able to automatically discover from the input decoys a suitable threshold distance for clustering. Several methods for this discovery are implemented in Calibur, where by default a very fast one is used. Using the default method Calibur reported relatively good decoys in our tests.

CONCLUSIONS

Calibur's ability to handle very large protein decoy sets makes it a useful tool for clustering decoys in ab initio protein structure prediction. As the number of decoys generated in these methods increases, we believe Calibur will come in important for progress in the field.

摘要

背景

从头蛋白质结构预测方法生成大量的结构候选者,这些候选者被称为诱饵。具有最多邻居数量的诱饵通常被识别为最具代表性的诱饵,这些邻居的数量最多可达一个阈值距离。然而,用于该标准的诱饵聚类涉及到的计算时间在最好的情况下是诱饵数量的二次方。因此,目前没有设计用于精确聚类大量诱饵的工具,从而在分析中形成了瓶颈。

结果

我们使用了三种旨在提高性能的策略(接近诱饵的组织、通过上下界进行初步筛选、异常值过滤),设计并实现了一种称为 Calibur 的诱饵聚类软件工具。我们展示了表明所采用的每种策略的有效性的经验结果。根据其有效性进一步对策略进行微调。Calibur 证明了能够很好地扩展到诱饵数量增加的能力。对于大约 30000 个诱饵的样本量,Calibur 在不使用策略的情况下完成分析所需时间的三分之一。对于实际使用,Calibur 能够自动从输入诱饵中发现适合聚类的合适阈值距离。Calibur 中实现了几种用于此发现的方法,默认使用非常快速的方法。使用默认方法,Calibur 在我们的测试中报告了相对较好的诱饵。

结论

Calibur 能够处理非常大的蛋白质诱饵集,使其成为从头蛋白质结构预测中聚类诱饵的有用工具。随着这些方法生成的诱饵数量的增加,我们相信 Calibur 将对该领域的进展变得非常重要。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/76d2/2881085/3ccaae9488d6/1471-2105-11-25-12.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/76d2/2881085/1adea4a471f3/1471-2105-11-25-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/76d2/2881085/b01ef63ae94d/1471-2105-11-25-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/76d2/2881085/c011e59f9b6d/1471-2105-11-25-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/76d2/2881085/f1f143504c90/1471-2105-11-25-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/76d2/2881085/b7e6a0f9f1de/1471-2105-11-25-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/76d2/2881085/9eeb8063af28/1471-2105-11-25-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/76d2/2881085/3ebba7e5b06f/1471-2105-11-25-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/76d2/2881085/9fad0cb6d303/1471-2105-11-25-8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/76d2/2881085/41f84b350ba8/1471-2105-11-25-9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/76d2/2881085/401434002425/1471-2105-11-25-10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/76d2/2881085/784346fd0542/1471-2105-11-25-11.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/76d2/2881085/3ccaae9488d6/1471-2105-11-25-12.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/76d2/2881085/1adea4a471f3/1471-2105-11-25-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/76d2/2881085/b01ef63ae94d/1471-2105-11-25-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/76d2/2881085/c011e59f9b6d/1471-2105-11-25-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/76d2/2881085/f1f143504c90/1471-2105-11-25-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/76d2/2881085/b7e6a0f9f1de/1471-2105-11-25-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/76d2/2881085/9eeb8063af28/1471-2105-11-25-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/76d2/2881085/3ebba7e5b06f/1471-2105-11-25-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/76d2/2881085/9fad0cb6d303/1471-2105-11-25-8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/76d2/2881085/41f84b350ba8/1471-2105-11-25-9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/76d2/2881085/401434002425/1471-2105-11-25-10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/76d2/2881085/784346fd0542/1471-2105-11-25-11.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/76d2/2881085/3ccaae9488d6/1471-2105-11-25-12.jpg

相似文献

1
Calibur: a tool for clustering large numbers of protein decoys.Calibur:一种用于对大量蛋白质诱饵进行聚类的工具。
BMC Bioinformatics. 2010 Jan 13;11:25. doi: 10.1186/1471-2105-11-25.
2
Entropy-accelerated exact clustering of protein decoys.熵加速的蛋白质诱饵精确聚类。
Bioinformatics. 2011 Apr 1;27(7):939-45. doi: 10.1093/bioinformatics/btr072. Epub 2011 Feb 9.
3
Clustering 100,000 protein structure decoys in minutes.在数分钟内对 10 万个蛋白质结构 decoys 进行聚类。
IEEE/ACM Trans Comput Biol Bioinform. 2012 May-Jun;9(3):765-73. doi: 10.1109/TCBB.2011.142.
4
An improved method to detect correct protein folds using partial clustering.使用部分聚类检测正确蛋白质折叠的改进方法。
BMC Bioinformatics. 2013 Jan 16;14:11. doi: 10.1186/1471-2105-14-11.
5
Decoy selection for protein structure prediction via extreme gradient boosting and ranking.通过极端梯度提升和排序选择蛋白质结构预测的诱饵。
BMC Bioinformatics. 2020 Dec 9;21(Suppl 1):189. doi: 10.1186/s12859-020-3523-9.
6
Ranking near-native candidate protein structures via random forest classification.基于随机森林分类的近天然候选蛋白结构排序。
BMC Bioinformatics. 2019 Dec 24;20(Suppl 25):683. doi: 10.1186/s12859-019-3257-8.
7
Enhancing fragment-based protein structure prediction by customising fragment cardinality according to local secondary structure.根据局部二级结构定制片段基数以增强基于片段的蛋白质结构预测。
BMC Bioinformatics. 2020 May 1;21(1):170. doi: 10.1186/s12859-020-3491-0.
8
SCUD: fast structure clustering of decoys using reference state to remove overall rotation.SCUD:利用参考状态去除整体旋转对诱饵进行快速结构聚类
J Comput Chem. 2005 Aug;26(11):1189-92. doi: 10.1002/jcc.20251.
9
Durandal: fast exact clustering of protein decoys.Durandal:快速精确的蛋白质诱饵聚类。
J Comput Chem. 2012 Feb 5;33(4):471-4. doi: 10.1002/jcc.21988. Epub 2011 Nov 26.
10
How well can we predict native contacts in proteins based on decoy structures and their energies?基于诱饵结构及其能量,我们能多准确地预测蛋白质中的天然接触点?
Proteins. 2003 Sep 1;52(4):598-608. doi: 10.1002/prot.10444.

引用本文的文献

1
Cytochrome P450 Enzyme Design by Constraining the Catalytic Pocket in a Diffusion Model.通过在扩散模型中限制催化口袋来设计细胞色素P450酶
Research (Wash D C). 2024 Jul 8;7:0413. doi: 10.34133/research.0413. eCollection 2024.
2
Multidimensional Cross-Linking and Real-Time Informatics for Multiprotein Interaction Studies.多维交联与实时信息学在多蛋白相互作用研究中的应用。
J Proteome Res. 2024 Jan 5;23(1):107-116. doi: 10.1021/acs.jproteome.3c00455. Epub 2023 Dec 26.
3
Multiple ParA/MinD ATPases coordinate the positioning of disparate cargos in a bacterial cell.

本文引用的文献

1
Least-squares fitting of two 3-d point sets.最小二乘拟合两个三维点集。
IEEE Trans Pattern Anal Mach Intell. 1987 May;9(5):698-700. doi: 10.1109/tpami.1987.4767965.
2
Fragment-HMM: a new approach to protein structure prediction.片段隐马尔可夫模型:一种蛋白质结构预测的新方法。
Protein Sci. 2008 Nov;17(11):1925-34. doi: 10.1110/ps.036442.108. Epub 2008 Aug 22.
3
Ab initio modeling of small proteins by iterative TASSER simulations.通过迭代TASSER模拟对小蛋白质进行从头建模。
多个 ParA/MinD ATP 酶在细菌细胞中协调不同货物的定位。
Nat Commun. 2023 Jun 5;14(1):3255. doi: 10.1038/s41467-023-39019-x.
4
CXCR4 Recognition by L- and D-Peptides Containing the Full-Length V3 Loop of HIV-1 gp120.CXCR4 的识别由包含全长 HIV-1 gp120 V3 环的 L-和 D-肽引起。
Viruses. 2023 Apr 28;15(5):1084. doi: 10.3390/v15051084.
5
Mechanistic Insight into the Suppression of Polyglutamine Aggregation by SRCP1.SRCP1 抑制多聚谷氨酰胺聚集的机制研究。
ACS Chem Biol. 2023 Mar 17;18(3):549-560. doi: 10.1021/acschembio.2c00893. Epub 2023 Feb 15.
6
Insights into autoregulation of a membrane protein complex by its cytoplasmic domains.细胞质结构域对膜蛋白复合物自身调节作用的研究进展
Biophys J. 2023 Feb 7;122(3):577-594. doi: 10.1016/j.bpj.2022.12.021. Epub 2022 Dec 17.
7
PCPD: Plant cytochrome P450 database and web-based tools for structural construction and ligand docking.PCPD:植物细胞色素P450数据库及用于结构构建和配体对接的基于网络的工具。
Synth Syst Biotechnol. 2021 Apr 24;6(2):102-109. doi: 10.1016/j.synbio.2021.04.004. eCollection 2021 Jun.
8
Molecular Characterisation of Titin N2A and Its Binding of CARP Reveals a Titin/Actin Cross-linking Mechanism.肌联蛋白 N2A 的分子特征及其与 CARP 的结合揭示了一种肌联蛋白/肌动蛋白交联机制。
J Mol Biol. 2021 Apr 30;433(9):166901. doi: 10.1016/j.jmb.2021.166901. Epub 2021 Feb 27.
9
Antidepressant drugs act by directly binding to TRKB neurotrophin receptors.抗抑郁药通过直接结合 TRKB 神经营养因子受体起作用。
Cell. 2021 Mar 4;184(5):1299-1313.e19. doi: 10.1016/j.cell.2021.01.034. Epub 2021 Feb 18.
10
A New Computer Model for Evaluating the Selective Binding Affinity of Phenylalkylamines to T-Type Ca Channels.一种用于评估苯烷基胺与T型钙通道选择性结合亲和力的新型计算机模型。
Pharmaceuticals (Basel). 2021 Feb 10;14(2):141. doi: 10.3390/ph14020141.
BMC Biol. 2007 May 8;5:17. doi: 10.1186/1741-7007-5-17.
4
Sampling realistic protein conformations using local structural bias.利用局部结构偏差对实际蛋白质构象进行采样。
PLoS Comput Biol. 2006 Sep 22;2(9):e131. doi: 10.1371/journal.pcbi.0020131. Epub 2006 Aug 21.
5
SCUD: fast structure clustering of decoys using reference state to remove overall rotation.SCUD:利用参考状态去除整体旋转对诱饵进行快速结构聚类
J Comput Chem. 2005 Aug;26(11):1189-92. doi: 10.1002/jcc.20251.
6
SPICKER: a clustering approach to identify near-native protein folds.SPICKER:一种用于识别接近天然蛋白质折叠结构的聚类方法。
J Comput Chem. 2004 Apr 30;25(6):865-71. doi: 10.1002/jcc.20011.
7
A revised proof of the metric properties of optimally superimposed vector sets.最优叠加向量集度量性质的修正证明。
Acta Crystallogr A. 2002 Sep;58(Pt 5):506. doi: 10.1107/s0108767302011637. Epub 2002 Sep 1.
8
Fully automated ab initio protein structure prediction using I-SITES, HMMSTR and ROSETTA.使用I-SITES、HMMSTR和ROSETTA进行全自动从头算蛋白质结构预测。
Bioinformatics. 2002;18 Suppl 1:S54-61. doi: 10.1093/bioinformatics/18.suppl_1.s54.
9
Clustering of low-energy conformations near the native structures of small proteins.小蛋白质天然结构附近低能量构象的聚类
Proc Natl Acad Sci U S A. 1998 Sep 15;95(19):11158-62. doi: 10.1073/pnas.95.19.11158.
10
Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions.使用模拟退火和贝叶斯评分函数从具有相似局部序列的片段中组装蛋白质三级结构。
J Mol Biol. 1997 Apr 25;268(1):209-25. doi: 10.1006/jmbi.1997.0959.