Sun Wenli, Chang Changgee, Zhao Yize, Long Qi
Department of Biostatistics, Epidemiology and Informatics The University of Pennsylvania, Philadelphia, PA, 19104.
Department of Healthcare Policy and Research Weill Cornell Medicine, Cornell University, New York, NY, 10065.
Proc IEEE Int Conf Big Data. 2018 Dec;2018:1484-1493. doi: 10.1109/BigData.2018.8622484. Epub 2019 Jan 24.
Support vector machine (SVM) is a popular classification method for the analysis of wide range of data including big data. Many SVM methods with feature selection have been developed under frequentist regularization or Bayesian shrinkage frameworks. On the other hand, the importance of incorporating a priori known biological knowledge, such as gene pathway information which stems from the gene regulatory network, into the statistical analysis of genomic data has been recognized in recent years. In this article, we propose a new Bayesian SVM approach that enables the feature selection to be guided by the knowledge on the graphical structure among predictors. The proposed method uses the spike-and-slab prior for feature selection, combined with the Ising prior that encourages group-wise selection of the predictors adjacent to each other on the known graph. Gibbs sampling algorithm is used for Bayesian inference. The performance of our method is evaluated and compared with existing SVM methods in terms of prediction and feature selection in extensive simulation settings. In addition, our method is illustrated in the analysis of genomic data from a cancer study, demonstrating its advantage in generating biologically meaningful results and identifying potentially important features.
支持向量机(SVM)是一种广受欢迎的分类方法,用于分析包括大数据在内的各种数据。许多带有特征选择的支持向量机方法已在频率主义正则化或贝叶斯收缩框架下得到发展。另一方面,近年来人们已经认识到,将先验已知的生物学知识,如源自基因调控网络的基因通路信息,纳入基因组数据的统计分析中的重要性。在本文中,我们提出了一种新的贝叶斯支持向量机方法,该方法能够使特征选择受预测变量之间图形结构知识的引导。所提出的方法使用尖峰和平板先验进行特征选择,并结合伊辛先验,该先验鼓励在已知图上对彼此相邻的预测变量进行分组选择。吉布斯采样算法用于贝叶斯推断。在广泛的模拟设置中,我们从预测和特征选择方面评估了我们方法的性能,并与现有的支持向量机方法进行了比较。此外,我们在一项癌症研究的基因组数据分析中展示了我们的方法,证明了其在生成具有生物学意义的结果和识别潜在重要特征方面的优势。