大规模分布式聚类：一种用于基因表达数据重复测量的新算法。

Mass distributed clustering: a new algorithm for repeated measurements in gene expression data.

作者信息

Matsumoto Shinya, Aisaki Ken-ichi, Kanno Jun

机构信息

Teradata Division, NCR Japan, Ltd. 2-4-1 Shiba-koen, Tokyo 105-0011, Japan.

出版信息

Genome Inform. 2005;16(2):183-94.

PMID:16901101

Abstract

The availability of whole-genome sequence data and high-throughput techniques such as DNA microarray enable researchers to monitor the alteration of gene expression by a certain organ or tissue in a comprehensive manner. The quantity of gene expression data can be greater than 30,000 genes per one measurement, making data clustering methods for analysis essential. Biologists usually design experimental protocols so that statistical significance can be evaluated; often, they conduct experiments in triplicate to generate a mean and standard deviation. Existing clustering methods usually use these mean or median values, rather than the original data, and take significance into account by omitting data showing large standard deviations, which eliminates potentially useful information. We propose a clustering method that uses each of the triplicate data sets as a probability distribution function instead of pooling data points into a median or mean. This method permits truly unsupervised clustering of the data from DNA microarrays.

摘要

全基因组序列数据的可获得性以及诸如DNA微阵列等高通量技术，使研究人员能够全面监测特定器官或组织中基因表达的变化。每次测量的基因表达数据量可能超过30000个基因，这使得用于分析的数据聚类方法至关重要。生物学家通常设计实验方案以便能够评估统计显著性；他们常常进行三次重复实验以生成均值和标准差。现有的聚类方法通常使用这些均值或中值，而不是原始数据，并通过省略显示出较大标准差的数据来考虑显著性，这就消除了潜在的有用信息。我们提出一种聚类方法，该方法将每个三次重复数据集用作概率分布函数，而不是将数据点汇总为中值或均值。这种方法允许对来自DNA微阵列的数据进行真正的无监督聚类。

相似文献

Mass distributed clustering: a new algorithm for repeated measurements in gene expression data.

Genome Inform. 2005;16(2):183-94.

Robust multi-scale clustering of large DNA microarray datasets with the consensus algorithm.

Bioinformatics. 2006 Jan 1;22(1):58-67. doi: 10.1093/bioinformatics/bti746. Epub 2005 Oct 27.

A hierarchical clustering algorithm for MIMD architecture.

Comput Biol Chem. 2004 Dec;28(5-6):417-9. doi: 10.1016/j.compbiolchem.2004.09.002.

Towards clustering of incomplete microarray data without the use of imputation.

Bioinformatics. 2007 Jan 1;23(1):107-13. doi: 10.1093/bioinformatics/btl555. Epub 2006 Oct 31.

Cluster stability scores for microarray data in cancer studies.

BMC Bioinformatics. 2003 Sep 6;4:36. doi: 10.1186/1471-2105-4-36.

Iterative class discovery and feature selection using Minimal Spanning Trees.

BMC Bioinformatics. 2004 Sep 8;5:126. doi: 10.1186/1471-2105-5-126.

Analysis of a Gibbs sampler method for model-based clustering of gene expression data.

Bioinformatics. 2008 Jan 15;24(2):176-83. doi: 10.1093/bioinformatics/btm562. Epub 2007 Nov 22.

Incorporating gene functions as priors in model-based clustering of microarray gene expression data.

Bioinformatics. 2006 Apr 1;22(7):795-801. doi: 10.1093/bioinformatics/btl011. Epub 2006 Jan 24.

Graph-based consensus clustering for class discovery from gene expression data.

Bioinformatics. 2007 Nov 1;23(21):2888-96. doi: 10.1093/bioinformatics/btm463. Epub 2007 Sep 14.

Randomized maps for assessing the reliability of patients clusters in DNA microarray data analyses.

Artif Intell Med. 2006 Jun;37(2):85-109. doi: 10.1016/j.artmed.2006.03.005. Epub 2006 May 23.

引用本文的文献

Importance of replication in analyzing time-series gene expression data: corticosteroid dynamics and circadian patterns in rat liver.

BMC Bioinformatics. 2010 May 26;11:279. doi: 10.1186/1471-2105-11-279.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

大规模分布式聚类：一种用于基因表达数据重复测量的新算法。

Mass distributed clustering: a new algorithm for repeated measurements in gene expression data.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献