Zhang Ji-Gang, Deng Hong-Wen
Laboratory of Molecular and Statistical Genetics, College of Life Sciences, Hunan Normal University, Changsha, Hunan 410081, P, R, China.
BMC Bioinformatics. 2007 Oct 3;8(1):370. doi: 10.1186/1471-2105-8-370.
BACKGROUND: With DNA microarray data, selecting a compact subset of discriminative genes from thousands of genes is a critical step for accurate classification of phenotypes for, e.g., disease diagnosis. Several widely used gene selection methods often select top-ranked genes according to their individual discriminative power in classifying samples into distinct categories, without considering correlations among genes. A limitation of these gene selection methods is that they may result in gene sets with some redundancy and yield an unnecessary large number of candidate genes for classification analyses. Some latest studies show that incorporating gene to gene correlations into gene selection can remove redundant genes and improve classification accuracy. RESULTS: In this study, we propose a new method, Based Bayes error Filter (BBF), to select relevant genes and remove redundant genes in classification analyses of microarray data. The effectiveness and accuracy of this method is demonstrated through analyses of five publicly available microarray datasets. The results show that our gene selection method is capable of achieving better accuracies than previous studies, while being able to effectively select relevant genes, remove redundant genes and obtain efficient and small gene sets for sample classification purposes. CONCLUSION: The proposed method can effectively identify a compact set of genes with high classification accuracy. This study also indicates that application of the Bayes error is a feasible and effective wayfor removing redundant genes in gene selection.
背景:利用DNA微阵列数据,从数千个基因中选择一个紧凑的判别基因子集是准确分类表型(如疾病诊断)的关键步骤。几种广泛使用的基因选择方法通常根据基因在将样本分类到不同类别中的个体判别能力来选择排名靠前的基因,而不考虑基因之间的相关性。这些基因选择方法的一个局限性在于,它们可能会导致基因集存在一些冗余,并产生大量不必要的候选基因用于分类分析。一些最新研究表明,将基因与基因的相关性纳入基因选择可以去除冗余基因并提高分类准确性。结果:在本研究中,我们提出了一种新的方法,即基于贝叶斯误差过滤器(BBF),用于在微阵列数据的分类分析中选择相关基因并去除冗余基因。通过对五个公开可用的微阵列数据集的分析,证明了该方法的有效性和准确性。结果表明,我们的基因选择方法能够比以前的研究获得更好的准确性,同时能够有效地选择相关基因,去除冗余基因,并获得用于样本分类目的的高效且小的基因集。结论:所提出的方法可以有效地识别一组具有高分类准确性的紧凑基因。本研究还表明,应用贝叶斯误差是在基因选择中去除冗余基因的一种可行且有效的方法。