微阵列数据分析中的多类聚类与预测

Multi-class clustering and prediction in the analysis of microarray data.

作者信息

Tsai Chen-An, Lee Te-Chang, Ho I-Ching, Yang Ueng-Cheng, Chen Chun-Houh, Chen James J

机构信息

Division of Biometry and Risk Assessment, National Center for Toxicological Research, Food and Drug Administration NCTR/FDA/HFT-20 Jefferson, AR 72079, USA.

出版信息

Math Biosci. 2005 Jan;193(1):79-100. doi: 10.1016/j.mbs.2004.07.002. Epub 2004 Dec 28.

DOI:10.1016/j.mbs.2004.07.002

PMID:15681277

Abstract

DNA microarray technology provides tools for studying the expression profiles of a large number of distinct genes simultaneously. This technology has been applied to sample clustering and sample prediction. Because of a large number of genes measured, many of the genes in the original data set are irrelevant to the analysis. Selection of discriminatory genes is critical to the accuracy of clustering and prediction. This paper considers statistical significance testing approach to selecting discriminatory gene sets for multi-class clustering and prediction of experimental samples. A toxicogenomic data set with nine treatments (a control and eight metals, As, Cd, Ni, Cr, Sb, Pb, Cu, and AsV with a total of 55 samples) is used to illustrate a general framework of the approach. Among four selected gene sets, a gene set omega(I) formed by the intersection of the F-test and the set of the union of one-versus-all t-tests performs the best in terms of clustering as well as prediction. Hierarchical and two modified partition (k-means) methods all show that the set omega(I) is able to group the 55 samples into seven clusters reasonably well, in which the As and AsV samples are considered as one cluster (the same group) as are the Cd and Cu samples. With respect to prediction, the overall accuracy for the gene set omega(I) using the nearest neighbors algorithm to predict 55 samples into one of the nine treatments is 85%.

摘要

DNA微阵列技术提供了可同时研究大量不同基因表达谱的工具。该技术已应用于样本聚类和样本预测。由于测量的基因数量众多，原始数据集中的许多基因与分析无关。选择具有区分性的基因对于聚类和预测的准确性至关重要。本文考虑采用统计显著性检验方法来选择用于多类聚类和实验样本预测的具有区分性的基因集。使用一个包含九种处理（一种对照和八种金属，即砷、镉、镍、铬、锑、铅、铜和砷酸五价物，共55个样本）的毒理基因组数据集来说明该方法的一般框架。在四个选定的基因集中，由F检验与一对一t检验的并集的交集形成的基因集ω(I)在聚类和预测方面表现最佳。层次聚类法和两种改进的划分（k均值）方法均表明，基因集ω(I)能够将55个样本合理地分为七个簇，其中砷和砷酸五价物样本被视为一个簇（同一组），镉和铜样本也是如此。在预测方面，使用最近邻算法将55个样本预测为九种处理之一时，基因集ω(I)的总体准确率为85%。

相似文献

Multi-class clustering and prediction in the analysis of microarray data.

Math Biosci. 2005 Jan;193(1):79-100. doi: 10.1016/j.mbs.2004.07.002. Epub 2004 Dec 28.

Analysis of a Gibbs sampler method for model-based clustering of gene expression data.

Bioinformatics. 2008 Jan 15;24(2):176-83. doi: 10.1093/bioinformatics/btm562. Epub 2007 Nov 22.

Iterative class discovery and feature selection using Minimal Spanning Trees.

BMC Bioinformatics. 2004 Sep 8;5:126. doi: 10.1186/1471-2105-5-126.

Comparison of supervised clustering methods to discriminate genotoxic from non-genotoxic carcinogens by gene expression profiling.

Mutat Res. 2005 Aug 4;575(1-2):17-33. doi: 10.1016/j.mrfmmm.2005.02.006. Epub 2005 Apr 19.

Randomized maps for assessing the reliability of patients clusters in DNA microarray data analyses.

Artif Intell Med. 2006 Jun;37(2):85-109. doi: 10.1016/j.artmed.2006.03.005. Epub 2006 May 23.

Application of Multi-SOM clustering approach to macrophage gene expression analysis.

Infect Genet Evol. 2009 May;9(3):328-36. doi: 10.1016/j.meegid.2008.09.009. Epub 2008 Oct 17.

Robust multi-scale clustering of large DNA microarray datasets with the consensus algorithm.

Bioinformatics. 2006 Jan 1;22(1):58-67. doi: 10.1093/bioinformatics/bti746. Epub 2005 Oct 27.

Modified fuzzy gap statistic for estimating preferable number of clusters in fuzzy k-means clustering.

J Biosci Bioeng. 2008 Mar;105(3):273-81. doi: 10.1263/jbb.105.273.

Mass distributed clustering: a new algorithm for repeated measurements in gene expression data.

Genome Inform. 2005;16(2):183-94.

Gene selection for sample classifications in microarray experiments.

DNA Cell Biol. 2004 Oct;23(10):607-14. doi: 10.1089/dna.2004.23.607.

引用本文的文献

ABC gene-ranking for prediction of drug-induced cholestasis in rats.

Toxicol Rep. 2016 Jan 18;3:252-261. doi: 10.1016/j.toxrep.2016.01.009. eCollection 2016.

Nonlinear dependence in the discovery of differentially expressed genes.

ISRN Bioinform. 2012 Apr 12;2012:564715. doi: 10.5402/2012/564715. eCollection 2012.

Instance-based concept learning from multiclass DNA microarray data.

BMC Bioinformatics. 2006 Feb 16;7:73. doi: 10.1186/1471-2105-7-73.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

微阵列数据分析中的多类聚类与预测

Multi-class clustering and prediction in the analysis of microarray data.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献