Huang Desheng, Wei Peng, Pan Wei
Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, 55455, USA.
OMICS. 2006 Spring;10(1):28-39. doi: 10.1089/omi.2006.10.28.
It has been increasingly recognized that incorporating prior knowledge into cluster analysis can result in more reliable and meaningful clusters. In contrast to the standard modelbased clustering with a global mixture model, which does not use any prior information, a stratified mixture model was recently proposed to incorporate gene functions or biological pathways as priors in model-based clustering of gene expression profiles: various gene functional groups form the strata in a stratified mixture model. Albeit useful, the stratified method may be less efficient than the global analysis if the strata are non-informative to clustering. We propose a weighted method that aims to strike a balance between a stratified analysis and a global analysis: it weights between the clustering results of the stratified analysis and that of the global analysis; the weight is determined by data. More generally, the weighted method can take advantage of the hierarchical structure of most existing gene functional annotation systems, such as MIPS and Gene Ontology (GO), and facilitate choosing appropriate gene functional groups as priors. We use simulated data and real data to demonstrate the feasibility and advantages of the proposed method.
越来越多的人认识到,将先验知识纳入聚类分析可以得到更可靠、更有意义的聚类结果。与不使用任何先验信息的基于全局混合模型的标准模型聚类不同,最近有人提出了一种分层混合模型,将基因功能或生物途径作为先验信息纳入基因表达谱的基于模型的聚类中:在分层混合模型中,各种基因功能组构成了层次。尽管分层方法很有用,但如果这些层次对聚类没有信息价值,那么它可能不如全局分析有效。我们提出了一种加权方法,旨在在分层分析和全局分析之间取得平衡:它对分层分析的聚类结果和全局分析的聚类结果进行加权;权重由数据决定。更一般地说,加权方法可以利用大多数现有基因功能注释系统(如MIPS和基因本体论(GO))的层次结构,并有助于选择合适的基因功能组作为先验信息。我们使用模拟数据和实际数据来证明所提出方法的可行性和优势。