Yoshida Ryo, Higuchi Tomoyuki, Imoto Seiya, Miyano Satoru
Human Genome Center, Institute of Medical Science, University of Tokyo 4-6-1 Shirokanedai, Minato-ku, Tokyo 108-8639, Japan.
Bioinformatics. 2006 Jun 15;22(12):1538-9. doi: 10.1093/bioinformatics/btl129. Epub 2006 Apr 10.
One of the significant challenges in gene expression analysis is to find unknown subtypes of several diseases at the molecular levels. This task can be addressed by grouping gene expression patterns of the collected samples on the basis of a large number of genes. Application of commonly used clustering methods to such a dataset however are likely to fail owing to over-learning, because the number of samples to be grouped is much smaller than the data dimension which is equal to the number of genes involved in the dataset. To overcome such difficulty, we developed a novel model-based clustering method, referred to as the mixed factors analysis. The ArrayCluster is a freely available software to perform the mixed factors analysis. It provides us some analytic tools for clustering DNA microarray experiments, data visualization and an automatic detector for module transcriptional of genes that are relevant to the calibrated molecular subtypes and so on.
基因表达分析中的一个重大挑战是在分子水平上找到几种疾病的未知亚型。通过基于大量基因对收集到的样本的基因表达模式进行分组,可以解决这个任务。然而,由于过学习,将常用的聚类方法应用于这样的数据集很可能会失败,因为要分组的样本数量远小于数据维度,而数据维度等于数据集中所涉及的基因数量。为了克服这种困难,我们开发了一种基于模型的新型聚类方法,称为混合因子分析。ArrayCluster是一个可免费获得的软件,用于执行混合因子分析。它为我们提供了一些用于对DNA微阵列实验进行聚类、数据可视化以及用于与校准分子亚型相关的基因模块转录的自动检测器等分析工具。