使用传递闭包聚类进行全面的聚类分析。

Comprehensive cluster analysis with Transitivity Clustering.

机构信息

Buck Institute for Age Research, Novato, California, USA.

出版信息

Nat Protoc. 2011 Mar;6(3):285-95. doi: 10.1038/nprot.2010.197. Epub 2011 Feb 10.

DOI:10.1038/nprot.2010.197

Abstract

Transitivity Clustering is a method for the partitioning of biological data into groups of similar objects, such as genes, for instance. It provides integrated access to various functions addressing each step of a typical cluster analysis. To facilitate this, Transitivity Clustering is accessible online and offers three user-friendly interfaces: a powerful stand-alone version, a web interface, and a collection of Cytoscape plug-ins. In this paper, we describe three major workflows: (i) protein (super)family detection with Cytoscape, (ii) protein homology detection with incomplete gold standards and (iii) clustering of gene expression data. This protocol guides the user through the most important features of Transitivity Clustering and takes ∼1 h to complete.

摘要

连通聚类是一种将生物数据分割成相似对象组的方法，例如基因。它提供了对各种功能的集成访问，这些功能解决了典型聚类分析的每一步。为了便于使用，连通聚类在线提供了三个用户友好的界面：功能强大的独立版本、网络界面和 Cytoscape 插件集。在本文中，我们描述了三个主要工作流程：（i）使用 Cytoscape 进行蛋白质（超）家族检测，（ii）使用不完整的金标准进行蛋白质同源性检测，以及（iii）基因表达数据聚类。本方案指导用户了解连通聚类的最重要功能，大约需要 1 小时完成。

相似文献

Comprehensive cluster analysis with Transitivity Clustering.使用传递闭包聚类进行全面的聚类分析。

Nat Protoc. 2011 Mar;6(3):285-95. doi: 10.1038/nprot.2010.197. Epub 2011 Feb 10.

ProClust: improved clustering of protein sequences with an extended graph-based approach.ProClust：基于扩展的图形方法改进蛋白质序列聚类

Bioinformatics. 2002;18 Suppl 2:S182-91. doi: 10.1093/bioinformatics/18.suppl_2.s182.

Computational method for temporal pattern discovery in biomedical genomic databases.生物医学基因组数据库中时间模式发现的计算方法。

Proc IEEE Comput Syst Bioinform Conf. 2005:362-5. doi: 10.1109/csb.2005.25.

CD-HIT Suite: a web server for clustering and comparing biological sequences.CD-HIT 套件：用于聚类和比较生物序列的网络服务器。

Bioinformatics. 2010 Mar 1;26(5):680-2. doi: 10.1093/bioinformatics/btq003. Epub 2010 Jan 6.

WebVar: A resource for the rapid estimation of relative site variability from multiple sequence alignments.WebVar：一种用于从多序列比对中快速估计相对位点变异性的资源。

Bioinformatics. 2004 May 22;20(8):1331-3. doi: 10.1093/bioinformatics/bth076. Epub 2004 Feb 10.

The MIGenAS integrated bioinformatics toolkit for web-based sequence analysis.用于基于网络的序列分析的MIGenAS综合生物信息学工具包。

Nucleic Acids Res. 2006 Jul 1;34(Web Server issue):W15-9. doi: 10.1093/nar/gkl254.

Using MUMmer to identify similar regions in large sequence sets.使用MUMmer在大型序列集中识别相似区域。

Curr Protoc Bioinformatics. 2003 Feb;Chapter 10:Unit 10.3. doi: 10.1002/0471250953.bi1003s00.

wEMBOSS: a web interface for EMBOSS.wEMBOSS：EMBOSS的网络界面。

Bioinformatics. 2005 Feb 15;21(4):540-1. doi: 10.1093/bioinformatics/bti031. Epub 2004 Sep 23.

INCA: synonymous codon usage analysis and clustering by means of self-organizing map.INCA：通过自组织映射进行同义密码子使用分析和聚类

Bioinformatics. 2004 Sep 22;20(14):2329-30. doi: 10.1093/bioinformatics/bth238. Epub 2004 Apr 1.

SWeBLAST: a Sliding Window Web-based BLAST tool for recombinant analysis.SWeBLAST：一种用于重组分析的基于滑动窗口网络的BLAST工具。

J Virol Methods. 2008 Sep;152(1-2):98-101. doi: 10.1016/j.jviromet.2008.06.009. Epub 2008 Jul 17.

引用本文的文献

Guiding biomedical clustering with ClustEval.用 ClustEval 指导生物医学聚类。

Nat Protoc. 2018 Jun;13(6):1429-1444. doi: 10.1038/nprot.2018.038. Epub 2018 May 24.

Novel 9-cis/all-trans β-carotene isomerases from plastidic oil bodies in Dunaliella bardawil catalyze the conversion of all-trans to 9-cis β-carotene.来自巴氏杜氏藻质体油体的新型9-顺式/全反式β-胡萝卜素异构酶催化全反式β-胡萝卜素向9-顺式β-胡萝卜素的转化。

Plant Cell Rep. 2017 Jun;36(6):807-814. doi: 10.1007/s00299-017-2110-7. Epub 2017 Mar 11.

Cache Domains That are Homologous to, but Different from PAS Domains Comprise the Largest Superfamily of Extracellular Sensors in Prokaryotes.与PAS结构域同源但不同的缓存结构域构成了原核生物中最大的细胞外传感器超家族。

PLoS Comput Biol. 2016 Apr 6;12(4):e1004862. doi: 10.1371/journal.pcbi.1004862. eCollection 2016 Apr.

The CopC Family: Structural and Bioinformatic Insights into a Diverse Group of Periplasmic Copper Binding Proteins.CopC家族：对多种周质铜结合蛋白的结构和生物信息学见解

Biochemistry. 2016 Apr 19;55(15):2278-90. doi: 10.1021/acs.biochem.6b00175. Epub 2016 Apr 6.

Comparing the performance of biomedical clustering methods.比较生物医学聚类方法的性能。

Nat Methods. 2015 Nov;12(11):1033-8. doi: 10.1038/nmeth.3583. Epub 2015 Sep 21.

Comparison of topological clustering within protein networks using edge metrics that evaluate full sequence, full structure, and active site microenvironment similarity.使用评估全序列、全结构和活性位点微环境相似性的边度量对蛋白质网络内的拓扑聚类进行比较。

Protein Sci. 2015 Sep;24(9):1423-39. doi: 10.1002/pro.2724. Epub 2015 Aug 18.

A multistage mathematical approach to automated clustering of high-dimensional noisy data.一种用于高维噪声数据自动聚类的多阶段数学方法。

Proc Natl Acad Sci U S A. 2015 Apr 7;112(14):4477-82. doi: 10.1073/pnas.1503940112. Epub 2015 Mar 23.

Diversity of the metal-transporting P1B-type ATPases.金属转运P1B型ATP酶的多样性。

J Biol Inorg Chem. 2014 Aug;19(6):947-60. doi: 10.1007/s00775-014-1129-2. Epub 2014 Apr 13.

Bi-Force: large-scale bicluster editing and its application to gene expression data biclustering.双力法：大规模双聚类编辑及其在基因表达数据双聚类中的应用。

Nucleic Acids Res. 2014 May;42(9):e78. doi: 10.1093/nar/gku201. Epub 2014 Mar 20.

The path to triacylglyceride obesity in the sta6 strain of Chlamydomonas reinhardtii.莱茵衣藻sta6菌株中三酰甘油肥胖的途径。

Eukaryot Cell. 2014 May;13(5):591-613. doi: 10.1128/EC.00013-14. Epub 2014 Feb 28.

本文引用的文献

Partitioning biological data with transitivity clustering.通过传递性聚类对生物数据进行划分。

Nat Methods. 2010 Jun;7(6):419-20. doi: 10.1038/nmeth0610-419.

Exact and heuristic algorithms for weighted cluster editing.加权聚类编辑的精确算法和启发式算法。

Comput Syst Bioinformatics Conf. 2007;6:391-401.

Integration of biological networks and gene expression data using Cytoscape.使用Cytoscape整合生物网络与基因表达数据。

Nat Protoc. 2007;2(10):2366-82. doi: 10.1038/nprot.2007.324.

Large scale clustering of protein sequences with FORCE -A layout based heuristic for weighted cluster editing.基于FORCE -A布局启发式算法的蛋白质序列大规模聚类用于加权聚类编辑。

BMC Bioinformatics. 2007 Oct 17;8:396. doi: 10.1186/1471-2105-8-396.

Clustering by passing messages between data points.通过在数据点之间传递信息进行聚类。

Science. 2007 Feb 16;315(5814):972-6. doi: 10.1126/science.1136800. Epub 2007 Jan 11.

Spectral clustering of protein sequences.蛋白质序列的谱聚类

Nucleic Acids Res. 2006 Mar 17;34(5):1571-80. doi: 10.1093/nar/gkj515. Print 2006.

A gold standard set of mechanistically diverse enzyme superfamilies.一组具有不同作用机制的酶超家族的金标准。

Genome Biol. 2006;7(1):R8. doi: 10.1186/gb-2006-7-1-r8. Epub 2006 Jan 31.

Large scale hierarchical clustering of protein sequences.蛋白质序列的大规模层次聚类

BMC Bioinformatics. 2005 Jan 22;6:15. doi: 10.1186/1471-2105-6-15.

Protein families and TRIBES in genome sequence space.基因组序列空间中的蛋白质家族与部落

Nucleic Acids Res. 2003 Aug 1;31(15):4632-8. doi: 10.1093/nar/gkg495.

An efficient algorithm for large-scale detection of protein families.一种用于大规模检测蛋白质家族的高效算法。

Nucleic Acids Res. 2002 Apr 1;30(7):1575-84. doi: 10.1093/nar/30.7.1575.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

使用传递闭包聚类进行全面的聚类分析。

Comprehensive cluster analysis with Transitivity Clustering.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献