Qu Yi, Xu Shizhong
Department of Botany and Plant Sciences, University of California, Riverside, CA 92521, USA.
Bioinformatics. 2004 Aug 12;20(12):1905-13. doi: 10.1093/bioinformatics/bth177. Epub 2004 Mar 25.
Grouping genes having similar expression patterns is called gene clustering, which has been proved to be a useful tool for extracting underlying biological information of gene expression data. Many clustering procedures have shown success in microarray gene clustering; most of them belong to the family of heuristic clustering algorithms. Model-based algorithms are alternative clustering algorithms, which are based on the assumption that the whole set of microarray data is a finite mixture of a certain type of distributions with different parameters. Application of the model-based algorithms to unsupervised clustering has been reported. Here, for the first time, we demonstrated the use of the model-based algorithm in supervised clustering of microarray data.
We applied the proposed methods to real gene expression data and simulated data. We showed that the supervised model-based algorithm is superior over the unsupervised method and the support vector machines (SVM) method.
The program written in the SAS language implementing methods I-III in this report is available upon request. The software of SVMs is available in the website http://svm.sdsc.edu/cgi-bin/nph-SVMsubmit.cgi
将具有相似表达模式的基因进行分组称为基因聚类,事实证明这是提取基因表达数据潜在生物学信息的有用工具。许多聚类程序在微阵列基因聚类中已取得成功;其中大多数属于启发式聚类算法家族。基于模型的算法是另一类聚类算法,其基于这样的假设:微阵列数据的整个集合是具有不同参数的某种分布的有限混合。已有将基于模型的算法应用于无监督聚类的报道。在此,我们首次展示了基于模型的算法在微阵列数据监督聚类中的应用。
我们将所提出的方法应用于真实基因表达数据和模拟数据。我们表明,基于监督模型的算法优于无监督方法和支持向量机(SVM)方法。
应要求可提供用SAS语言编写的实现本报告中方法I - III的程序。支持向量机软件可在网站http://svm.sdsc.edu/cgi-bin/nph-SVMsubmit.cgi获取