• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

MULTI-K:使用集成 k-均值聚类进行微阵列亚型的准确分类。

MULTI-K: accurate classification of microarray subtypes using ensemble k-means clustering.

机构信息

National Institute for Mathematical Sciences (NIMS), Yuseong, Daejeon 305-340, Republic of Korea.

出版信息

BMC Bioinformatics. 2009 Aug 22;10:260. doi: 10.1186/1471-2105-10-260.

DOI:10.1186/1471-2105-10-260
PMID:19698124
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2743671/
Abstract

BACKGROUND

Uncovering subtypes of disease from microarray samples has important clinical implications such as survival time and sensitivity of individual patients to specific therapies. Unsupervised clustering methods have been used to classify this type of data. However, most existing methods focus on clusters with compact shapes and do not reflect the geometric complexity of the high dimensional microarray clusters, which limits their performance.

RESULTS

We present a cluster-number-based ensemble clustering algorithm, called MULTI-K, for microarray sample classification, which demonstrates remarkable accuracy. The method amalgamates multiple k-means runs by varying the number of clusters and identifies clusters that manifest the most robust co-memberships of elements. In addition to the original algorithm, we newly devised the entropy-plot to control the separation of singletons or small clusters. MULTI-K, unlike the simple k-means or other widely used methods, was able to capture clusters with complex and high-dimensional structures accurately. MULTI-K outperformed other methods including a recently developed ensemble clustering algorithm in tests with five simulated and eight real gene-expression data sets.

CONCLUSION

The geometric complexity of clusters should be taken into account for accurate classification of microarray data, and ensemble clustering applied to the number of clusters tackles the problem very well. The C++ code and the data sets tested are available from the authors.

摘要

背景

从微阵列样本中发现疾病亚型具有重要的临床意义,例如患者的生存时间和对特定治疗的敏感性。已使用无监督聚类方法对这类数据进行分类。然而,大多数现有方法侧重于形状紧凑的聚类,而不能反映高维微阵列聚类的几何复杂性,这限制了它们的性能。

结果

我们提出了一种基于聚类数的集成聚类算法 MULTI-K,用于微阵列样本分类,该算法具有出色的准确性。该方法通过改变聚类数来合并多个 k-均值运行,并识别出表现出最稳健元素共同成员关系的聚类。除了原始算法,我们还新设计了熵图来控制单例或小聚类的分离。与简单的 k-均值或其他广泛使用的方法不同,MULTI-K 能够准确地捕获具有复杂和高维结构的聚类。MULTI-K 在五个模拟和八个真实基因表达数据集的测试中优于其他方法,包括最近开发的一种集成聚类算法。

结论

为了准确分类微阵列数据,应考虑聚类的几何复杂性,并且应用于聚类数的集成聚类很好地解决了该问题。作者提供了 C++代码和测试数据集。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fbf/2743671/121ca56dc3c8/1471-2105-10-260-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fbf/2743671/697d4fc6d913/1471-2105-10-260-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fbf/2743671/2aded42b4f5f/1471-2105-10-260-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fbf/2743671/121ca56dc3c8/1471-2105-10-260-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fbf/2743671/697d4fc6d913/1471-2105-10-260-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fbf/2743671/2aded42b4f5f/1471-2105-10-260-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fbf/2743671/121ca56dc3c8/1471-2105-10-260-3.jpg

相似文献

1
MULTI-K: accurate classification of microarray subtypes using ensemble k-means clustering.MULTI-K:使用集成 k-均值聚类进行微阵列亚型的准确分类。
BMC Bioinformatics. 2009 Aug 22;10:260. doi: 10.1186/1471-2105-10-260.
2
FLAME, a novel fuzzy clustering method for the analysis of DNA microarray data.FLAME,一种用于分析DNA微阵列数据的新型模糊聚类方法。
BMC Bioinformatics. 2007 Jan 4;8:3. doi: 10.1186/1471-2105-8-3.
3
Robust multi-scale clustering of large DNA microarray datasets with the consensus algorithm.使用一致性算法对大型DNA微阵列数据集进行稳健的多尺度聚类
Bioinformatics. 2006 Jan 1;22(1):58-67. doi: 10.1093/bioinformatics/bti746. Epub 2005 Oct 27.
4
An entropy-based gene selection method for cancer classification using microarray data.一种基于熵的利用微阵列数据进行癌症分类的基因选择方法。
BMC Bioinformatics. 2005 Mar 24;6:76. doi: 10.1186/1471-2105-6-76.
5
NIFTI: an evolutionary approach for finding number of clusters in microarray data.NIFTI:一种用于确定微阵列数据中聚类数量的进化方法。
BMC Bioinformatics. 2009 Jan 30;10:40. doi: 10.1186/1471-2105-10-40.
6
Detecting clusters of different geometrical shapes in microarray gene expression data.在微阵列基因表达数据中检测不同几何形状的聚类。
Bioinformatics. 2005 May 1;21(9):1927-34. doi: 10.1093/bioinformatics/bti251. Epub 2005 Jan 12.
7
A modified hyperplane clustering algorithm allows for efficient and accurate clustering of extremely large datasets.一种改进的超平面聚类算法能够对超大型数据集进行高效且准确的聚类。
Bioinformatics. 2009 May 1;25(9):1152-7. doi: 10.1093/bioinformatics/btp123. Epub 2009 Mar 4.
8
Quadratic regression analysis for gene discovery and pattern recognition for non-cyclic short time-course microarray experiments.用于非循环短时间进程微阵列实验的基因发现和模式识别的二次回归分析。
BMC Bioinformatics. 2005 Apr 25;6:106. doi: 10.1186/1471-2105-6-106.
9
A new validity measure for a correlation-based fuzzy c-means clustering algorithm.一种基于相关性的模糊 c 均值聚类算法的新有效性度量。
Annu Int Conf IEEE Eng Med Biol Soc. 2009;2009:3865-8. doi: 10.1109/IEMBS.2009.5332582.
10
Simultaneous gene clustering and subset selection for sample classification via MDL.通过最小描述长度实现用于样本分类的同步基因聚类和子集选择
Bioinformatics. 2003 Jun 12;19(9):1100-9. doi: 10.1093/bioinformatics/btg039.

引用本文的文献

1
Deep learning in structural bioinformatics: current applications and future perspectives.结构生物信息学中的深度学习:当前应用与未来展望。
Brief Bioinform. 2024 Mar 27;25(3). doi: 10.1093/bib/bbae042.
2
Band-based similarity indices for gene expression classification and clustering.基于带的基因表达分类和聚类相似性指数。
Sci Rep. 2021 Nov 3;11(1):21609. doi: 10.1038/s41598-021-00678-9.
3
An improved algorithm for the maximal information coefficient and its application.一种改进的最大信息系数算法及其应用。

本文引用的文献

1
A Study of the Comparability of External Criteria for Hierarchical Cluster Analysis.层次聚类分析外部准则的可比性研究
Multivariate Behav Res. 1986 Oct 1;21(4):441-58. doi: 10.1207/s15327906mbr2104_5.
2
Ensemble learning of genetic networks from time-series expression data.基于时间序列表达数据的基因网络集成学习
Bioinformatics. 2007 Dec 1;23(23):3225-31. doi: 10.1093/bioinformatics/btm514. Epub 2007 Oct 31.
3
Graph-based consensus clustering for class discovery from gene expression data.基于图的共识聚类用于从基因表达数据中发现类别
R Soc Open Sci. 2021 Feb 10;8(2):201424. doi: 10.1098/rsos.201424.
4
Cluster ensemble based on Random Forests for genetic data.基于随机森林的基因数据聚类集成方法
BioData Min. 2017 Dec 15;10:37. doi: 10.1186/s13040-017-0156-2. eCollection 2017.
5
Integrative Sparse -Means With Overlapping Group Lasso in Genomic Applications for Disease Subtype Discovery.用于疾病亚型发现的基因组应用中具有重叠组套索的整合稀疏均值法
Ann Appl Stat. 2017 Jun;11(2):1011-1039. doi: 10.1214/17-AOAS1033. Epub 2017 Jul 20.
6
Meta-analytic framework for sparse -means to identify disease subtypes in multiple transcriptomic studies.用于在多个转录组学研究中识别疾病亚型的稀疏均值荟萃分析框架。
J Am Stat Assoc. 2016;111(513):27-42. doi: 10.1080/01621459.2015.1086354. Epub 2016 May 5.
7
A novel approach identifies the first transcriptome networks in bats: a new genetic model for vocal communication.一种新方法识别出蝙蝠中的首个转录组网络:用于声音交流的新遗传模型。
BMC Genomics. 2015 Oct 22;16:836. doi: 10.1186/s12864-015-2068-1.
8
Integrative clustering methods for high-dimensional molecular data.用于高维分子数据的整合聚类方法
Transl Cancer Res. 2014 Jun 1;3(3):202-216. doi: 10.3978/j.issn.2218-676X.2014.06.03.
9
Critical limitations of consensus clustering in class discovery.共识聚类在类别发现中的关键局限性。
Sci Rep. 2014 Aug 27;4:6207. doi: 10.1038/srep06207.
10
Non-specific filtering of beta-distributed data.无信息过滤的贝塔分布数据。
BMC Bioinformatics. 2014 Jun 19;15:199. doi: 10.1186/1471-2105-15-199.
Bioinformatics. 2007 Nov 1;23(21):2888-96. doi: 10.1093/bioinformatics/btm463. Epub 2007 Sep 14.
4
Molecular classification of breast cancer: implications for selection of adjuvant chemotherapy.乳腺癌的分子分类:对辅助化疗选择的影响
Nat Clin Pract Oncol. 2006 Nov;3(11):621-32. doi: 10.1038/ncponc0636.
5
Evaluation of stability of k-means cluster ensembles with respect to random initialization.关于随机初始化的k均值聚类集成稳定性评估。
IEEE Trans Pattern Anal Mach Intell. 2006 Nov;28(11):1798-808. doi: 10.1109/TPAMI.2006.226.
6
Clustering microarray gene expression data using weighted Chinese restaurant process.使用加权中国餐馆过程对微阵列基因表达数据进行聚类
Bioinformatics. 2006 Aug 15;22(16):1988-97. doi: 10.1093/bioinformatics/btl284. Epub 2006 Jun 9.
7
Ensemble classifier for protein fold pattern recognition.用于蛋白质折叠模式识别的集成分类器。
Bioinformatics. 2006 Jul 15;22(14):1717-22. doi: 10.1093/bioinformatics/btl170. Epub 2006 May 3.
8
The role of microRNA genes in papillary thyroid carcinoma.微小RNA基因在甲状腺乳头状癌中的作用。
Proc Natl Acad Sci U S A. 2005 Dec 27;102(52):19075-80. doi: 10.1073/pnas.0509603102. Epub 2005 Dec 19.
9
Computational cluster validation in post-genomic data analysis.后基因组数据分析中的计算聚类验证
Bioinformatics. 2005 Aug 1;21(15):3201-12. doi: 10.1093/bioinformatics/bti517. Epub 2005 May 24.
10
Ensemble dependence model for classification and prediction of cancer and normal gene expression data.用于癌症和正常基因表达数据分类与预测的集成依赖模型。
Bioinformatics. 2005 Jul 15;21(14):3114-21. doi: 10.1093/bioinformatics/bti483. Epub 2005 May 6.