Suppr超能文献

微阵列数据分析中的多类聚类与预测

Multi-class clustering and prediction in the analysis of microarray data.

作者信息

Tsai Chen-An, Lee Te-Chang, Ho I-Ching, Yang Ueng-Cheng, Chen Chun-Houh, Chen James J

机构信息

Division of Biometry and Risk Assessment, National Center for Toxicological Research, Food and Drug Administration NCTR/FDA/HFT-20 Jefferson, AR 72079, USA.

出版信息

Math Biosci. 2005 Jan;193(1):79-100. doi: 10.1016/j.mbs.2004.07.002. Epub 2004 Dec 28.

Abstract

DNA microarray technology provides tools for studying the expression profiles of a large number of distinct genes simultaneously. This technology has been applied to sample clustering and sample prediction. Because of a large number of genes measured, many of the genes in the original data set are irrelevant to the analysis. Selection of discriminatory genes is critical to the accuracy of clustering and prediction. This paper considers statistical significance testing approach to selecting discriminatory gene sets for multi-class clustering and prediction of experimental samples. A toxicogenomic data set with nine treatments (a control and eight metals, As, Cd, Ni, Cr, Sb, Pb, Cu, and AsV with a total of 55 samples) is used to illustrate a general framework of the approach. Among four selected gene sets, a gene set omega(I) formed by the intersection of the F-test and the set of the union of one-versus-all t-tests performs the best in terms of clustering as well as prediction. Hierarchical and two modified partition (k-means) methods all show that the set omega(I) is able to group the 55 samples into seven clusters reasonably well, in which the As and AsV samples are considered as one cluster (the same group) as are the Cd and Cu samples. With respect to prediction, the overall accuracy for the gene set omega(I) using the nearest neighbors algorithm to predict 55 samples into one of the nine treatments is 85%.

摘要

DNA微阵列技术提供了可同时研究大量不同基因表达谱的工具。该技术已应用于样本聚类和样本预测。由于测量的基因数量众多,原始数据集中的许多基因与分析无关。选择具有区分性的基因对于聚类和预测的准确性至关重要。本文考虑采用统计显著性检验方法来选择用于多类聚类和实验样本预测的具有区分性的基因集。使用一个包含九种处理(一种对照和八种金属,即砷、镉、镍、铬、锑、铅、铜和砷酸五价物,共55个样本)的毒理基因组数据集来说明该方法的一般框架。在四个选定的基因集中,由F检验与一对一t检验的并集的交集形成的基因集ω(I)在聚类和预测方面表现最佳。层次聚类法和两种改进的划分(k均值)方法均表明,基因集ω(I)能够将55个样本合理地分为七个簇,其中砷和砷酸五价物样本被视为一个簇(同一组),镉和铜样本也是如此。在预测方面,使用最近邻算法将55个样本预测为九种处理之一时,基因集ω(I)的总体准确率为85%。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验