Department of Health Statistics, Chongqing Medical University, Chongqing, China.
PLoS One. 2013 Jul 3;8(7):e67672. doi: 10.1371/journal.pone.0067672. Print 2013.
The discovery of genetic or genomic markers plays a central role in the development of personalized medicine. A notable challenge exists when dealing with the high dimensionality of the data sets, as thousands of genes or millions of genetic variants are collected on a relatively small number of subjects. Traditional gene-wise selection methods using univariate analyses face difficulty to incorporate correlational, structural, or functional structures amongst the molecular measures. For microarray gene expression data, we first summarize solutions in dealing with 'large p, small n' problems, and then propose an integrative Bayesian variable selection (iBVS) framework for simultaneously identifying causal or marker genes and regulatory pathways. A novel partial least squares (PLS) g-prior for iBVS is developed to allow the incorporation of prior knowledge on gene-gene interactions or functional relationships. From the point view of systems biology, iBVS enables user to directly target the joint effects of multiple genes and pathways in a hierarchical modeling diagram to predict disease status or phenotype. The estimated posterior selection probabilities offer probabilitic and biological interpretations. Both simulated data and a set of microarray data in predicting stroke status are used in validating the performance of iBVS in a Probit model with binary outcomes. iBVS offers a general framework for effective discovery of various molecular biomarkers by combining data-based statistics and knowledge-based priors. Guidelines on making posterior inferences, determining Bayesian significance levels, and improving computational efficiencies are also discussed.
遗传或基因组标志物的发现在个性化医学的发展中起着核心作用。当处理数据集的高维性时,存在一个显著的挑战,因为在相对较少的受试者上收集了数千个基因或数百万个遗传变异。传统的基于基因的选择方法使用单变量分析,难以整合分子测量之间的相关性、结构性或功能性结构。对于微阵列基因表达数据,我们首先总结了解决“大 p,小 n”问题的解决方案,然后提出了一种集成贝叶斯变量选择(iBVS)框架,用于同时识别因果或标记基因和调控途径。为了允许纳入基因-基因相互作用或功能关系的先验知识,我们为 iBVS 开发了一种新颖的偏最小二乘(PLS)g-先验。从系统生物学的角度来看,iBVS 使研究人员能够直接针对多个基因和途径的联合效应在层次建模图中进行预测疾病状态或表型。估计的后选概率提供了概率和生物学解释。在 Probit 模型中使用模拟数据和一组用于预测中风状态的微阵列数据来验证 iBVS 在二元结果中的性能。iBVS 通过结合基于数据的统计和基于知识的先验,提供了一种用于有效发现各种分子生物标志物的通用框架。还讨论了进行后验推断、确定贝叶斯显著性水平和提高计算效率的指南。