二元状态模式聚类：癌症基因微阵列研究中用于类别和生物标志物发现的数字范式。

Binary state pattern clustering: a digital paradigm for class and biomarker discovery in gene microarray studies of cancer.

作者信息

Beattie Bradley J, Robinson Peter N

机构信息

Department of Neurology, Memorial Sloan-Kettering Cancer Center, New York, New York 10021, USA.

出版信息

J Comput Biol. 2006 Jun;13(5):1114-30. doi: 10.1089/cmb.2006.13.1114.

DOI:10.1089/cmb.2006.13.1114

PMID:16796554

Abstract

Class and biomarker discovery continue to be among the preeminent goals in gene microarray studies of cancer. We have developed a new data mining technique, which we call Binary State Pattern Clustering (BSPC) that is specifically adapted for these purposes, with cancer and other categorical datasets. BSPC is capable of uncovering statistically significant sample subclasses and associated marker genes in a completely unsupervised manner. This is accomplished through the application of a digital paradigm, where the expression level of each potential marker gene is treated as being representative of its discrete functional state. Multiple genes that divide samples into states along the same boundaries form a kind of gene-cluster that has an associated sample-cluster. BSPC is an extremely fast deterministic algorithm that scales well to large datasets. Here we describe results of its application to three publicly available oligonucleotide microarray datasets. Using an alpha-level of 0.05, clusters reproducing many of the known sample classifications were identified along with associated biomarkers. In addition, a number of simulations were conducted using shuffled versions of each of the original datasets, noise-added datasets, as well as completely artificial datasets. The robustness of BSPC was compared to that of three other publicly available clustering methods: ISIS, CTWC and SAMBA. The simulations demonstrate BSPC's substantially greater noise tolerance and confirm the accuracy of our calculations of statistical significance.

摘要

类别和生物标志物的发现仍然是癌症基因微阵列研究中的首要目标。我们开发了一种新的数据挖掘技术，我们称之为二元状态模式聚类（BSPC），它特别适用于这些目的，可用于癌症和其他分类数据集。BSPC能够以完全无监督的方式揭示具有统计学意义的样本子类和相关的标记基因。这是通过应用一种数字范式来实现的，其中每个潜在标记基因的表达水平被视为代表其离散的功能状态。沿着相同边界将样本划分为不同状态的多个基因形成一种基因簇，该基因簇具有一个相关的样本簇。BSPC是一种极其快速的确定性算法，能够很好地扩展到大型数据集。在这里，我们描述了将其应用于三个公开可用的寡核苷酸微阵列数据集的结果。使用0.05的α水平，识别出了许多重现已知样本分类的簇以及相关的生物标志物。此外，还使用了每个原始数据集的随机版本、添加噪声的数据集以及完全人工的数据集进行了一些模拟。将BSPC的稳健性与其他三种公开可用的聚类方法：ISIS、CTWC和SAMBA的稳健性进行了比较。模拟结果表明BSPC具有更高的噪声耐受性，并证实了我们对统计显著性计算的准确性。

相似文献

Binary state pattern clustering: a digital paradigm for class and biomarker discovery in gene microarray studies of cancer.二元状态模式聚类：癌症基因微阵列研究中用于类别和生物标志物发现的数字范式。

J Comput Biol. 2006 Jun;13(5):1114-30. doi: 10.1089/cmb.2006.13.1114.

Graph-based consensus clustering for class discovery from gene expression data.基于图的共识聚类用于从基因表达数据中发现类别

Bioinformatics. 2007 Nov 1;23(21):2888-96. doi: 10.1093/bioinformatics/btm463. Epub 2007 Sep 14.

Class discovery from gene expression data based on perturbation and cluster ensemble.基于扰动和聚类集成从基因表达数据中发现类别

IEEE Trans Nanobioscience. 2009 Jun;8(2):147-60. doi: 10.1109/TNB.2009.2023321. Epub 2009 Jun 2.

An unsupervised hierarchical dynamic self-organizing approach to cancer class discovery and marker gene identification in microarray data.一种用于微阵列数据中癌症类别发现和标记基因识别的无监督分层动态自组织方法。

Bioinformatics. 2003 Nov 1;19(16):2131-40. doi: 10.1093/bioinformatics/btg296.

Biomarker discovery across annotated and unannotated microarray datasets using semi-supervised learning.使用半监督学习在有注释和无注释的微阵列数据集中发现生物标志物。

BMC Genomics. 2008 Sep 16;9 Suppl 2(Suppl 2):S7. doi: 10.1186/1471-2164-9-S2-S7.

Coclustering of human cancer microarrays using Minimum Sum-Squared Residue coclustering.使用最小平方残差共聚类法对人类癌症微阵列进行共聚类分析。

IEEE/ACM Trans Comput Biol Bioinform. 2008 Jul-Sep;5(3):385-400. doi: 10.1109/TCBB.2007.70268.

Clustering of gene expression data: performance and similarity analysis.基因表达数据的聚类：性能与相似性分析

BMC Bioinformatics. 2006 Dec 12;7 Suppl 4(Suppl 4):S19. doi: 10.1186/1471-2105-7-S4-S19.

Robust multi-scale clustering of large DNA microarray datasets with the consensus algorithm.使用一致性算法对大型DNA微阵列数据集进行稳健的多尺度聚类

Bioinformatics. 2006 Jan 1;22(1):58-67. doi: 10.1093/bioinformatics/bti746. Epub 2005 Oct 27.

Evaluation of clustering algorithms for gene expression data.基因表达数据聚类算法的评估

BMC Bioinformatics. 2006 Dec 12;7 Suppl 4(Suppl 4):S17. doi: 10.1186/1471-2105-7-S4-S17.

LCE: a link-based cluster ensemble method for improved gene expression data analysis.LCE：一种基于链接的聚类集成方法，用于改进基因表达数据分析。

Bioinformatics. 2010 Jun 15;26(12):1513-9. doi: 10.1093/bioinformatics/btq226. Epub 2010 May 5.

引用本文的文献

Modelling gene expression profiles related to prostate tumor progression using binary states.使用二元状态对与前列腺肿瘤进展相关的基因表达谱进行建模。

Theor Biol Med Model. 2013 May 31;10:37. doi: 10.1186/1742-4682-10-37.

Identification of novel stem cell markers using gap analysis of gene expression data.利用基因表达数据的缺口分析鉴定新型干细胞标志物。

Genome Biol. 2007;8(9):R193. doi: 10.1186/gb-2007-8-9-r193.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

二元状态模式聚类：癌症基因微阵列研究中用于类别和生物标志物发现的数字范式。

Binary state pattern clustering: a digital paradigm for class and biomarker discovery in gene microarray studies of cancer.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献