基于微阵列基因表达数据的分类。
On the classification of microarray gene-expression data.
机构信息
Department of Mathematics, University of Queensland, St Lucia, QLD 4072, Australia.
出版信息
Brief Bioinform. 2013 Jul;14(4):402-10. doi: 10.1093/bib/bbs056. Epub 2012 Sep 17.
We consider the classification of microarray gene-expression data. First, attention is given to the supervised case, where the tissue samples are classified with respect to a number of predefined classes and the intent is to assign a new unclassified tissue to one of these classes. The problems of forming a classifier and estimating its error rate are addressed in the context of there being a relatively small number of observations (tissue samples) compared to the number of variables (that is, the genes, which can number in the tens of thousands). We then proceed to the unsupervised case and consider the clustering of the tissue samples and also the clustering of the gene profiles. Both problems can be viewed as being non-standard ones in statistics and we address some of the key issues involved. The focus is on the use of mixture models to effect the clustering for both problems.
我们研究了微阵列基因表达数据的分类。首先,我们关注有监督的情况,其中组织样本根据一些预定义的类别进行分类,目的是将新的未分类组织分配到这些类别之一。在观察数量(组织样本)相对于变量数量(即基因数量)相对较少的情况下,我们解决了形成分类器和估计其错误率的问题。然后,我们继续研究无监督情况,并考虑组织样本的聚类以及基因谱的聚类。这两个问题都可以看作是统计学中的非标准问题,我们解决了其中的一些关键问题。重点是使用混合模型对这两个问题进行聚类。