Zhang Lin, Baladandayuthapani Veerabhadran, Mallick Bani K, Manyam Ganiraju C, Thompson Patricia A, Bondy Melissa L, Do Kim-Anh
Department of Statistics, Texas A&M University, College Station, Texas, U.S.A.
Department of Biostatistics, The University of Texas M.D. Anderson Cancer Center, Houston, Texas, U.S.A.
J R Stat Soc Ser C Appl Stat. 2014 Aug;63(4):595-620. doi: 10.1111/rssc.12053.
The analysis of alterations that may occur in nature when segments of chromosomes are copied (known as copy number alterations) has been a focus of research to identify genetic markers of cancer. One high-throughput technique recently adopted is the use of molecular inversion probes (MIPs) to measure probe copy number changes. The resulting data consist of high-dimensional copy number profiles that can be used to ascertain probe-specific copy number alterations in correlative studies with patient outcomes to guide risk stratification and future treatment. We propose a novel Bayesian variable selection method, the hierarchical structured variable selection (HSVS) method, which accounts for the natural gene and probe-within-gene architecture to identify important genes and probes associated with clinically relevant outcomes. We propose the HSVS model for grouped variable selection, where simultaneous selection of both groups and within-group variables is of interest. The HSVS model utilizes a discrete mixture prior distribution for group selection and group-specific Bayesian lasso hierarchies for variable selection within groups. We provide methods for accounting for serial correlations within groups that incorporate Bayesian fused lasso methods for within-group selection. Through simulations we establish that our method results in lower model errors than other methods when a natural grouping structure exists. We apply our method to an MIP study of breast cancer and show that it identifies genes and probes that are significantly associated with clinically relevant subtypes of breast cancer.
对染色体片段复制时(即拷贝数改变)自然界中可能发生的改变进行分析,一直是癌症遗传标志物识别研究的重点。最近采用的一种高通量技术是使用分子倒置探针(MIP)来测量探针拷贝数变化。所得数据由高维拷贝数谱组成,可用于在与患者预后的相关性研究中确定探针特异性拷贝数改变,以指导风险分层和未来治疗。我们提出了一种新颖的贝叶斯变量选择方法,即分层结构变量选择(HSVS)方法,该方法考虑了自然基因和基因内探针结构,以识别与临床相关预后相关的重要基因和探针。我们提出了用于分组变量选择的HSVS模型,其中组和组内变量的同时选择是有意义的。HSVS模型利用离散混合先验分布进行组选择,并利用组特异性贝叶斯套索层次结构进行组内变量选择。我们提供了考虑组内序列相关性的方法,这些方法结合了用于组内选择的贝叶斯融合套索方法。通过模拟,我们确定当存在自然分组结构时,我们的方法比其他方法产生的模型误差更低。我们将我们的方法应用于一项乳腺癌MIP研究,并表明它识别出与乳腺癌临床相关亚型显著相关的基因和探针。