Suppr超能文献

贝叶斯分层结构变量选择方法及其在乳腺癌MIP研究中的应用

Bayesian hierarchical structured variable selection methods with application to MIP studies in breast cancer.

作者信息

Zhang Lin, Baladandayuthapani Veerabhadran, Mallick Bani K, Manyam Ganiraju C, Thompson Patricia A, Bondy Melissa L, Do Kim-Anh

机构信息

Department of Statistics, Texas A&M University, College Station, Texas, U.S.A.

Department of Biostatistics, The University of Texas M.D. Anderson Cancer Center, Houston, Texas, U.S.A.

出版信息

J R Stat Soc Ser C Appl Stat. 2014 Aug;63(4):595-620. doi: 10.1111/rssc.12053.

Abstract

The analysis of alterations that may occur in nature when segments of chromosomes are copied (known as copy number alterations) has been a focus of research to identify genetic markers of cancer. One high-throughput technique recently adopted is the use of molecular inversion probes (MIPs) to measure probe copy number changes. The resulting data consist of high-dimensional copy number profiles that can be used to ascertain probe-specific copy number alterations in correlative studies with patient outcomes to guide risk stratification and future treatment. We propose a novel Bayesian variable selection method, the hierarchical structured variable selection (HSVS) method, which accounts for the natural gene and probe-within-gene architecture to identify important genes and probes associated with clinically relevant outcomes. We propose the HSVS model for grouped variable selection, where simultaneous selection of both groups and within-group variables is of interest. The HSVS model utilizes a discrete mixture prior distribution for group selection and group-specific Bayesian lasso hierarchies for variable selection within groups. We provide methods for accounting for serial correlations within groups that incorporate Bayesian fused lasso methods for within-group selection. Through simulations we establish that our method results in lower model errors than other methods when a natural grouping structure exists. We apply our method to an MIP study of breast cancer and show that it identifies genes and probes that are significantly associated with clinically relevant subtypes of breast cancer.

摘要

对染色体片段复制时(即拷贝数改变)自然界中可能发生的改变进行分析,一直是癌症遗传标志物识别研究的重点。最近采用的一种高通量技术是使用分子倒置探针(MIP)来测量探针拷贝数变化。所得数据由高维拷贝数谱组成,可用于在与患者预后的相关性研究中确定探针特异性拷贝数改变,以指导风险分层和未来治疗。我们提出了一种新颖的贝叶斯变量选择方法,即分层结构变量选择(HSVS)方法,该方法考虑了自然基因和基因内探针结构,以识别与临床相关预后相关的重要基因和探针。我们提出了用于分组变量选择的HSVS模型,其中组和组内变量的同时选择是有意义的。HSVS模型利用离散混合先验分布进行组选择,并利用组特异性贝叶斯套索层次结构进行组内变量选择。我们提供了考虑组内序列相关性的方法,这些方法结合了用于组内选择的贝叶斯融合套索方法。通过模拟,我们确定当存在自然分组结构时,我们的方法比其他方法产生的模型误差更低。我们将我们的方法应用于一项乳腺癌MIP研究,并表明它识别出与乳腺癌临床相关亚型显著相关的基因和探针。

相似文献

1
Bayesian hierarchical structured variable selection methods with application to MIP studies in breast cancer.
J R Stat Soc Ser C Appl Stat. 2014 Aug;63(4):595-620. doi: 10.1111/rssc.12053.
2
Bayesian joint selection of genes and pathways: applications in multiple myeloma genomics.
Cancer Inform. 2014 Dec 7;13(Suppl 2):113-23. doi: 10.4137/CIN.S13787. eCollection 2014.
4
A Bayesian hierarchically structured prior for rare-variant association testing.
Genet Epidemiol. 2021 Jun;45(4):413-424. doi: 10.1002/gepi.22379. Epub 2021 Feb 10.
5
Bayesian Group Bridge for Bi-level Variable Selection.
Comput Stat Data Anal. 2017 Jun;110:115-133. doi: 10.1016/j.csda.2017.01.002. Epub 2017 Jan 18.
6
Variable selection for multiply-imputed data with application to dioxin exposure study.
Stat Med. 2013 Sep 20;32(21):3646-59. doi: 10.1002/sim.5783. Epub 2013 Mar 25.
8
Molecular inversion probes: a novel microarray technology and its application in cancer research.
Cancer Genet. 2012 Jul-Aug;205(7-8):341-55. doi: 10.1016/j.cancergen.2012.06.005.
9
A bayesian integrative model for genetical genomics with spatially informed variable selection.
Cancer Inform. 2014 Sep 21;13(Suppl 2):29-37. doi: 10.4137/CIN.S13784. eCollection 2014.
10
Hierarchical Bayesian formulations for selecting variables in regression models.
Stat Med. 2012 May 20;31(11-12):1221-37. doi: 10.1002/sim.4439. Epub 2012 Jan 25.

引用本文的文献

1
Multivariate Bayesian variable selection for multi-trait genetic fine mapping.
J R Stat Soc Ser C Appl Stat. 2024 Oct 28;74(2):331-351. doi: 10.1093/jrsssc/qlae055. eCollection 2025 Mar.
3
A hierarchical spike-and-slab model for pan-cancer survival using pan-omic data.
BMC Bioinformatics. 2022 Jun 17;23(1):235. doi: 10.1186/s12859-022-04770-3.
4
Knowledge-Guided Statistical Learning Methods for Analysis of High-Dimensional -Omics Data in Precision Oncology.
JCO Precis Oncol. 2019 Oct 24;3. doi: 10.1200/PO.19.00018. eCollection 2019 Oct.
5
Bayesian hierarchical models for high-dimensional mediation analysis with coordinated selection of correlated mediators.
Stat Med. 2021 Nov 30;40(27):6038-6056. doi: 10.1002/sim.9168. Epub 2021 Aug 17.
6
Bayesian Joint Spike-and-Slab Graphical Lasso.
Proc Mach Learn Res. 2019 Jun;97:3877-3885.
7
Bayesian sparse heritability analysis with high-dimensional neuroimaging phenotypes.
Biostatistics. 2022 Apr 13;23(2):467-484. doi: 10.1093/biostatistics/kxaa035.
8
Variable selection and estimation in causal inference using Bayesian spike and slab priors.
Stat Methods Med Res. 2020 Sep;29(9):2445-2469. doi: 10.1177/0962280219898497. Epub 2020 Jan 15.
9
Semiparametric Bayesian variable selection for gene-environment interactions.
Stat Med. 2020 Feb 28;39(5):617-638. doi: 10.1002/sim.8434. Epub 2019 Dec 21.
10
A Bayesian hierarchical variable selection prior for pathway-based GWAS using summary statistics.
Stat Med. 2020 Mar 15;39(6):724-739. doi: 10.1002/sim.8442. Epub 2019 Nov 27.

本文引用的文献

1
Bayesian Hidden Markov Modeling of Array CGH Data.
J Am Stat Assoc. 2008 Jun 1;103(482):485-497. doi: 10.1198/016214507000000923.
2
The Sparse Laplacian Shrinkage Estimator for High-Dimensional Regression.
Ann Stat. 2011;39(4):2021-2046. doi: 10.1214/11-aos897.
3
Selective genomic copy number imbalances and probability of recurrence in early-stage breast cancer.
PLoS One. 2011;6(8):e23543. doi: 10.1371/journal.pone.0023543. Epub 2011 Aug 12.
4
Bayesian Random Segmentation Models to Identify Shared Copy Number Aberrations for Array CGH Data.
J Am Stat Assoc. 2010 Dec;105(492):1358-1375. doi: 10.1198/jasa.2010.ap09250.
5
Bayesian ensemble methods for survival prediction in gene expression data.
Bioinformatics. 2011 Feb 1;27(3):359-67. doi: 10.1093/bioinformatics/btq660. Epub 2010 Dec 8.
6
Penalized methods for bi-level variable selection.
Stat Interface. 2009 Jul 1;2(3):369-380. doi: 10.4310/sii.2009.v2.n3.a10.
7
Identification of non-Hodgkin's lymphoma prognosis signatures using the CTGDR method.
Bioinformatics. 2010 Jan 1;26(1):15-21. doi: 10.1093/bioinformatics/btp604. Epub 2009 Oct 22.
8
Analysis of molecular inversion probe performance for allele copy number determination.
Genome Biol. 2007;8(11):R246. doi: 10.1186/gb-2007-8-11-r246.
9
Bayesian analysis of mass spectrometry proteomic data using wavelet-based functional mixed models.
Biometrics. 2008 Jun;64(2):479-89. doi: 10.1111/j.1541-0420.2007.00895.x. Epub 2007 Sep 20.
10
Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR.
Biometrics. 2008 Mar;64(1):115-23. doi: 10.1111/j.1541-0420.2007.00843.x. Epub 2007 Jun 30.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验