• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

CLAG:一种用于处理生物数据的无监督非层次聚类算法。

CLAG: an unsupervised non hierarchical clustering algorithm handling biological data.

机构信息

UPMC, UMR7238, Génomique Analytique, 15 rue de l'Ecole de Médecine, F-75006 Paris, France.

出版信息

BMC Bioinformatics. 2012 Aug 8;13:194. doi: 10.1186/1471-2105-13-194.

DOI:10.1186/1471-2105-13-194
PMID:23216858
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3519615/
Abstract

BACKGROUND

Searching for similarities in a set of biological data is intrinsically difficult due to possible data points that should not be clustered, or that should group within several clusters. Under these hypotheses, hierarchical agglomerative clustering is not appropriate. Moreover, if the dataset is not known enough, like often is the case, supervised classification is not appropriate either.

RESULTS

CLAG (for CLusters AGgregation) is an unsupervised non hierarchical clustering algorithm designed to cluster a large variety of biological data and to provide a clustered matrix and numerical values indicating cluster strength. CLAG clusterizes correlation matrices for residues in protein families, gene-expression and miRNA data related to various cancer types, sets of species described by multidimensional vectors of characters, binary matrices. It does not ask to all data points to cluster and it converges yielding the same result at each run. Its simplicity and speed allows it to run on reasonably large datasets.

CONCLUSIONS

CLAG can be used to investigate the cluster structure present in biological datasets and to identify its underlying graph. It showed to be more informative and accurate than several known clustering methods, as hierarchical agglomerative clustering, k-means, fuzzy c-means, model-based clustering, affinity propagation clustering, and not to suffer of the convergence problem proper to this latter.

摘要

背景

由于可能存在不应聚类的数据点,或者应在几个聚类中分组的数据点,因此在一组生物数据中寻找相似性本质上具有难度。在这些假设下,层次凝聚聚类并不合适。此外,如果数据集不够了解,就像通常情况一样,监督分类也不合适。

结果

CLAG(代表聚类聚合)是一种无监督的非层次聚类算法,旨在对各种生物数据进行聚类,并提供聚类矩阵和数值,以指示聚类强度。CLAG 对蛋白质家族中残基的相关矩阵、与各种癌症类型相关的基因表达和 miRNA 数据、由多维字符向量描述的物种集、二进制矩阵进行聚类。它不要求所有数据点进行聚类,并且在每次运行时都会收敛,从而产生相同的结果。其简单性和速度使其能够在相当大的数据集上运行。

结论

CLAG 可用于研究生物数据集的聚类结构,并识别其底层图。与层次凝聚聚类、k-均值、模糊 c-均值、基于模型的聚类、亲和传播聚类等几种已知聚类方法相比,CLAG 更具信息量和准确性,并且不会出现后一种方法特有的收敛问题。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/76e2/3519615/42031faafefa/1471-2105-13-194-8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/76e2/3519615/941137ce17de/1471-2105-13-194-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/76e2/3519615/36a42c250199/1471-2105-13-194-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/76e2/3519615/1c9d06517235/1471-2105-13-194-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/76e2/3519615/13686a771b0d/1471-2105-13-194-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/76e2/3519615/857533ca50a9/1471-2105-13-194-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/76e2/3519615/8ea779440360/1471-2105-13-194-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/76e2/3519615/a92db18f8364/1471-2105-13-194-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/76e2/3519615/42031faafefa/1471-2105-13-194-8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/76e2/3519615/941137ce17de/1471-2105-13-194-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/76e2/3519615/36a42c250199/1471-2105-13-194-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/76e2/3519615/1c9d06517235/1471-2105-13-194-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/76e2/3519615/13686a771b0d/1471-2105-13-194-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/76e2/3519615/857533ca50a9/1471-2105-13-194-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/76e2/3519615/8ea779440360/1471-2105-13-194-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/76e2/3519615/a92db18f8364/1471-2105-13-194-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/76e2/3519615/42031faafefa/1471-2105-13-194-8.jpg

相似文献

1
CLAG: an unsupervised non hierarchical clustering algorithm handling biological data.CLAG:一种用于处理生物数据的无监督非层次聚类算法。
BMC Bioinformatics. 2012 Aug 8;13:194. doi: 10.1186/1471-2105-13-194.
2
A dynamically growing self-organizing tree (DGSOT) for hierarchical clustering gene expression profiles.一种用于分层聚类基因表达谱的动态生长自组织树(DGSOT)。
Bioinformatics. 2004 Nov 1;20(16):2605-17. doi: 10.1093/bioinformatics/bth292. Epub 2004 May 6.
3
Clustering analysis of microRNA and mRNA expression data from TCGA using maximum edge-weighted matching algorithms.使用最大边权匹配算法对 TCGA 中 miRNA 和 mRNA 表达数据进行聚类分析。
BMC Med Genomics. 2019 Aug 5;12(1):117. doi: 10.1186/s12920-019-0562-z.
4
Interpolation based consensus clustering for gene expression time series.基于插值的基因表达时间序列一致性聚类
BMC Bioinformatics. 2015 Apr 16;16:117. doi: 10.1186/s12859-015-0541-0.
5
H-CLAP: hierarchical clustering within a linear array with an application in genetics.H-CLAP:线性阵列中的层次聚类及其在遗传学中的应用
Stat Appl Genet Mol Biol. 2015 Apr;14(2):125-41. doi: 10.1515/sagmb-2013-0076.
6
Does Determination of Initial Cluster Centroids Improve the Performance of -Means Clustering Algorithm? Comparison of Three Hybrid Methods by Genetic Algorithm, Minimum Spanning Tree, and Hierarchical Clustering in an Applied Study.初始聚类质心的确定是否能提高 -Means 聚类算法的性能?在应用研究中,通过遗传算法、最小生成树和层次聚类三种混合方法的比较。
Comput Math Methods Med. 2020 Aug 1;2020:7636857. doi: 10.1155/2020/7636857. eCollection 2020.
7
Knowledge-assisted recognition of cluster boundaries in gene expression data.基因表达数据中聚类边界的知识辅助识别。
Artif Intell Med. 2005 Sep-Oct;35(1-2):171-83. doi: 10.1016/j.artmed.2005.02.007.
8
Inference from clustering with application to gene-expression microarrays.基于聚类的推断及其在基因表达微阵列中的应用。
J Comput Biol. 2002;9(1):105-26. doi: 10.1089/10665270252833217.
9
Fuzzy ensemble clustering based on random projections for DNA microarray data analysis.基于随机投影的模糊集成聚类用于DNA微阵列数据分析
Artif Intell Med. 2009 Feb-Mar;45(2-3):173-83. doi: 10.1016/j.artmed.2008.07.014. Epub 2008 Sep 17.
10
Hybrid fuzzy cluster ensemble framework for tumor clustering from biomolecular data.用于从生物分子数据中进行肿瘤聚类的混合模糊聚类集成框架。
IEEE/ACM Trans Comput Biol Bioinform. 2013 May-Jun;10(3):657-70. doi: 10.1109/TCBB.2013.59.

引用本文的文献

1
The S100A7 nuclear interactors in autoimmune diseases: a coevolutionary study in mammals.自身免疫性疾病中的 S100A7 核相互作用蛋白:哺乳动物的共进化研究。
Immunogenetics. 2022 Jun;74(3):271-284. doi: 10.1007/s00251-022-01256-7. Epub 2022 Feb 16.
2
COVTree: Coevolution in OVerlapped sequences by Tree analysis server.COVTree:通过 Tree analysis server 分析重叠序列中的共进化。
Nucleic Acids Res. 2020 Jul 2;48(W1):W558-W565. doi: 10.1093/nar/gkaa330.
3
Coevolution analysis of amino-acids reveals diversified drug-resistance solutions in viral sequences: a case study of hepatitis B virus.

本文引用的文献

1
Dynamically weighted clustering with noise set.带噪声集的动态加权聚类。
Bioinformatics. 2010 Feb 1;26(3):341-7. doi: 10.1093/bioinformatics/btp671. Epub 2009 Dec 9.
2
A combinatorial approach to detect coevolved amino acid networks in protein families of variable divergence.一种用于检测不同分化程度蛋白质家族中协同进化氨基酸网络的组合方法。
PLoS Comput Biol. 2009 Sep;5(9):e1000488. doi: 10.1371/journal.pcbi.1000488. Epub 2009 Sep 4.
3
Clustering in the presence of scatter.存在散点情况下的聚类
氨基酸的协同进化分析揭示了病毒序列中多样化的耐药解决方案:以乙型肝炎病毒为例
Virus Evol. 2020 Feb 6;6(1):veaa006. doi: 10.1093/ve/veaa006. eCollection 2020 Jan.
4
S100A7/Ran-binding protein 9 coevolution in mammals.哺乳动物 S100A7/Ran 结合蛋白 9 的共同进化。
Immunogenetics. 2020 Apr;72(3):155-164. doi: 10.1007/s00251-020-01155-9. Epub 2020 Feb 10.
5
Large-Scale Comparison of Toxin and Antitoxins in .大规模比较毒素和抗毒素在...
Toxins (Basel). 2020 Jan 2;12(1):29. doi: 10.3390/toxins12010029.
6
CoevDB: a database of intramolecular coevolution among protein-coding genes of the bony vertebrates.CoevDB:骨态脊椎动物编码蛋白基因之间的分子内共进化数据库。
Nucleic Acids Res. 2019 Jan 8;47(D1):D50-D54. doi: 10.1093/nar/gky986.
7
BIS2Analyzer: a server for co-evolution analysis of conserved protein families.BIS2Analyzer:用于保守蛋白家族共进化分析的服务器。
Nucleic Acids Res. 2017 Jul 3;45(W1):W307-W314. doi: 10.1093/nar/gkx336.
8
A method for clustering of miRNA sequences using fragmented programming.一种使用片段编程对miRNA序列进行聚类的方法。
Bioinformation. 2016 Jan 31;12(1):15-8. doi: 10.6026/97320630012015. eCollection 2016.
9
Coevolution analysis of Hepatitis C virus genome to identify the structural and functional dependency network of viral proteins.丙型肝炎病毒基因组的共进化分析,以识别病毒蛋白的结构和功能依赖网络。
Sci Rep. 2016 May 20;6:26401. doi: 10.1038/srep26401.
10
miR-3646 promotes cell proliferation, migration, and invasion via regulating G2/M transition in human breast cancer cells.微小RNA-3646通过调控人乳腺癌细胞中的G2/M期转换来促进细胞增殖、迁移和侵袭。
Am J Transl Res. 2016 Apr 15;8(4):1659-77. eCollection 2016.
Biometrics. 2009 Jun;65(2):341-52. doi: 10.1111/j.1541-0420.2008.01064.x. Epub 2008 May 30.
4
An integrated system for studying residue coevolution in proteins.一种用于研究蛋白质中残基协同进化的集成系统。
Bioinformatics. 2008 Jan 15;24(2):290-2. doi: 10.1093/bioinformatics/btm584. Epub 2007 Dec 1.
5
Clustering by soft-constraint affinity propagation: applications to gene-expression data.基于软约束亲和传播的聚类:在基因表达数据中的应用
Bioinformatics. 2007 Oct 15;23(20):2708-15. doi: 10.1093/bioinformatics/btm414. Epub 2007 Sep 25.
6
Clustering by passing messages between data points.通过在数据点之间传递信息进行聚类。
Science. 2007 Feb 16;315(5814):972-6. doi: 10.1126/science.1136800. Epub 2007 Jan 11.
7
Optimized high-throughput microRNA expression profiling provides novel biomarker assessment of clinical prostate and breast cancer biopsies.优化的高通量微小RNA表达谱分析为临床前列腺癌和乳腺癌活检提供了新的生物标志物评估方法。
Mol Cancer. 2006 Jun 19;5:24. doi: 10.1186/1476-4598-5-24.
8
Mutual information in protein multiple sequence alignments reveals two classes of coevolving positions.蛋白质多序列比对中的互信息揭示了两类共同进化的位点。
Biochemistry. 2005 May 17;44(19):7156-65. doi: 10.1021/bi050293e.
9
Tight clustering: a resampling-based approach for identifying stable and tight patterns in data.紧密聚类:一种基于重采样的方法,用于识别数据中的稳定且紧密的模式。
Biometrics. 2005 Mar;61(1):10-6. doi: 10.1111/j.0006-341X.2005.031032.x.
10
A perturbation-based method for calculating explicit likelihood of evolutionary co-variance in multiple sequence alignments.一种基于微扰的方法,用于计算多序列比对中进化协方差的显式似然性。
Bioinformatics. 2004 Jul 10;20(10):1565-72. doi: 10.1093/bioinformatics/bth128. Epub 2004 Feb 12.