Zhou Xiaobo, Liu Kuang-Yu, Wong Stephen T C
Harvard Center for Neurodegeneration and Repair, Center for Bioinformatics, Harvard Medical School, 220 Longwood Avenue, Boston, MA 02115, USA.
J Biomed Inform. 2004 Aug;37(4):249-59. doi: 10.1016/j.jbi.2004.07.009.
In microarray-based cancer classification and prediction, gene selection is an important research problem owing to the large number of genes and the small number of experimental conditions. In this paper, we propose a Bayesian approach to gene selection and classification using the logistic regression model. The basic idea of our approach is in conjunction with a logistic regression model to relate the gene expression with the class labels. We use Gibbs sampling and Markov chain Monte Carlo (MCMC) methods to discover important genes. To implement Gibbs Sampler and MCMC search, we derive a posterior distribution of selected genes given the observed data. After the important genes are identified, the same logistic regression model is then used for cancer classification and prediction. Issues for efficient implementation for the proposed method are discussed. The proposed method is evaluated against several large microarray data sets, including hereditary breast cancer, small round blue-cell tumors, and acute leukemia. The results show that the method can effectively identify important genes consistent with the known biological findings while the accuracy of the classification is also high. Finally, the robustness and sensitivity properties of the proposed method are also investigated.
在基于微阵列的癌症分类和预测中,由于基因数量众多而实验条件较少,基因选择是一个重要的研究问题。在本文中,我们提出了一种使用逻辑回归模型进行基因选择和分类的贝叶斯方法。我们方法的基本思想是结合逻辑回归模型,将基因表达与类别标签联系起来。我们使用吉布斯采样和马尔可夫链蒙特卡罗(MCMC)方法来发现重要基因。为了实现吉布斯采样器和MCMC搜索,我们在给定观测数据的情况下推导所选基因的后验分布。在识别出重要基因后,然后使用相同的逻辑回归模型进行癌症分类和预测。讨论了所提方法有效实现的相关问题。所提方法针对几个大型微阵列数据集进行了评估,包括遗传性乳腺癌、小圆蓝细胞肿瘤和急性白血病。结果表明,该方法能够有效地识别与已知生物学发现一致的重要基因,同时分类准确率也很高。最后,还研究了所提方法的稳健性和敏感性特性。