Suppr超能文献

用于高维数据的知识引导贝叶斯支持向量机及其在基因组数据分析中的应用

Knowledge-Guided Bayesian Support Vector Machine for High-Dimensional Data with Application to Analysis of Genomics Data.

作者信息

Sun Wenli, Chang Changgee, Zhao Yize, Long Qi

机构信息

Department of Biostatistics, Epidemiology and Informatics The University of Pennsylvania, Philadelphia, PA, 19104.

Department of Healthcare Policy and Research Weill Cornell Medicine, Cornell University, New York, NY, 10065.

出版信息

Proc IEEE Int Conf Big Data. 2018 Dec;2018:1484-1493. doi: 10.1109/BigData.2018.8622484. Epub 2019 Jan 24.

Abstract

Support vector machine (SVM) is a popular classification method for the analysis of wide range of data including big data. Many SVM methods with feature selection have been developed under frequentist regularization or Bayesian shrinkage frameworks. On the other hand, the importance of incorporating a priori known biological knowledge, such as gene pathway information which stems from the gene regulatory network, into the statistical analysis of genomic data has been recognized in recent years. In this article, we propose a new Bayesian SVM approach that enables the feature selection to be guided by the knowledge on the graphical structure among predictors. The proposed method uses the spike-and-slab prior for feature selection, combined with the Ising prior that encourages group-wise selection of the predictors adjacent to each other on the known graph. Gibbs sampling algorithm is used for Bayesian inference. The performance of our method is evaluated and compared with existing SVM methods in terms of prediction and feature selection in extensive simulation settings. In addition, our method is illustrated in the analysis of genomic data from a cancer study, demonstrating its advantage in generating biologically meaningful results and identifying potentially important features.

摘要

支持向量机(SVM)是一种广受欢迎的分类方法,用于分析包括大数据在内的各种数据。许多带有特征选择的支持向量机方法已在频率主义正则化或贝叶斯收缩框架下得到发展。另一方面,近年来人们已经认识到,将先验已知的生物学知识,如源自基因调控网络的基因通路信息,纳入基因组数据的统计分析中的重要性。在本文中,我们提出了一种新的贝叶斯支持向量机方法,该方法能够使特征选择受预测变量之间图形结构知识的引导。所提出的方法使用尖峰和平板先验进行特征选择,并结合伊辛先验,该先验鼓励在已知图上对彼此相邻的预测变量进行分组选择。吉布斯采样算法用于贝叶斯推断。在广泛的模拟设置中,我们从预测和特征选择方面评估了我们方法的性能,并与现有的支持向量机方法进行了比较。此外,我们在一项癌症研究的基因组数据分析中展示了我们的方法,证明了其在生成具有生物学意义的结果和识别潜在重要特征方面的优势。

相似文献

1
Knowledge-Guided Bayesian Support Vector Machine for High-Dimensional Data with Application to Analysis of Genomics Data.
Proc IEEE Int Conf Big Data. 2018 Dec;2018:1484-1493. doi: 10.1109/BigData.2018.8622484. Epub 2019 Jan 24.
2
Graph-guided Bayesian SVM with Adaptive Structured Shrinkage Prior for High-dimensional Data.
Proc IEEE Int Conf Big Data. 2021 Dec;2021:4472-4479. doi: 10.1109/bigdata52589.2021.9671712.
3
Bayesian Non-linear Support Vector Machine for High-Dimensional Data with Incorporation of Graph Information on Features.
Proc IEEE Int Conf Big Data. 2019 Dec;2019:4874-4882. doi: 10.1109/bigdata47090.2019.9006473. Epub 2020 Feb 24.
4
Generalized Bayesian Factor Analysis for Integrative Clustering with Applications to Multi-Omics Data.
Proc Int Conf Data Sci Adv Anal. 2018 Oct;2018:109-119. doi: 10.1109/DSAA.2018.00021. Epub 2019 Feb 4.
6
An Efficient Feature Selection Strategy Based on Multiple Support Vector Machine Technology with Gene Expression Data.
Biomed Res Int. 2018 Aug 30;2018:7538204. doi: 10.1155/2018/7538204. eCollection 2018.
7
10
Bayesian network-guided sparse regression with flexible varying effects.
Biometrics. 2024 Oct 3;80(4). doi: 10.1093/biomtc/ujae111.

引用本文的文献

1
A scalable tool for analyzing genomic variants of humans using knowledge graphs and graph machine learning.
Front Big Data. 2025 Jan 21;7:1466391. doi: 10.3389/fdata.2024.1466391. eCollection 2024.
3
Bayesian Tensor Modeling for Image-based Classification of Alzheimer's Disease.
Neuroinformatics. 2024 Oct;22(4):437-455. doi: 10.1007/s12021-024-09669-3. Epub 2024 Jun 7.
4
Dissociation Time, Quantum Yield, and Dynamic Reaction Pathways in the Thermolysis of -3,4-Dimethyl-1,2-dioxetane.
J Phys Chem Lett. 2024 Feb 22;15(7):1846-1855. doi: 10.1021/acs.jpclett.3c03578. Epub 2024 Feb 9.
5
Graph-guided Bayesian SVM with Adaptive Structured Shrinkage Prior for High-dimensional Data.
Proc IEEE Int Conf Big Data. 2021 Dec;2021:4472-4479. doi: 10.1109/bigdata52589.2021.9671712.
6
Knowledge-Guided Statistical Learning Methods for Analysis of High-Dimensional -Omics Data in Precision Oncology.
JCO Precis Oncol. 2019 Oct 24;3. doi: 10.1200/PO.19.00018. eCollection 2019 Oct.
7
Bayesian Non-linear Support Vector Machine for High-Dimensional Data with Incorporation of Graph Information on Features.
Proc IEEE Int Conf Big Data. 2019 Dec;2019:4874-4882. doi: 10.1109/bigdata47090.2019.9006473. Epub 2020 Feb 24.

本文引用的文献

1
Scalable Bayesian variable selection for structured high-dimensional data.
Biometrics. 2018 Dec;74(4):1372-1382. doi: 10.1111/biom.12882. Epub 2018 May 8.
2
Sparse Bayesian classification and feature selection for biological expression data with high correlations.
PLoS One. 2017 Dec 27;12(12):e0189541. doi: 10.1371/journal.pone.0189541. eCollection 2017.
4
Variable Selection for Support Vector Machines in Moderately High Dimensions.
J R Stat Soc Series B Stat Methodol. 2016 Jan;78(1):53-76. doi: 10.1111/rssb.12100. Epub 2015 Jan 5.
5
Endoplasmic reticulum stress in malignancy.
Cancer Cell. 2014 May 12;25(5):563-73. doi: 10.1016/j.ccr.2014.03.015.
7
Elastic SCAD as a novel penalization method for SVM classification tasks in high-dimensional data.
BMC Bioinformatics. 2011 May 9;12:138. doi: 10.1186/1471-2105-12-138.
8
Monitoring and manipulating mammalian unfolded protein response.
Methods Enzymol. 2011;491:183-98. doi: 10.1016/B978-0-12-385928-0.00011-0.
9
Incorporating predictor network in penalized regression with application to microarray data.
Biometrics. 2010 Jun;66(2):474-84. doi: 10.1111/j.1541-0420.2009.01296.x. Epub 2009 Jul 23.
10
ToppGene Suite for gene list enrichment analysis and candidate gene prioritization.
Nucleic Acids Res. 2009 Jul;37(Web Server issue):W305-11. doi: 10.1093/nar/gkp427. Epub 2009 May 22.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验