IEEE/ACM Trans Comput Biol Bioinform. 2019 Nov-Dec;16(6):1986-1996. doi: 10.1109/TCBB.2018.2833487. Epub 2018 May 7.
Imaging genetics has attracted significant interests in recent studies. Traditional work has focused on mass-univariate statistical approaches that identify important single nucleotide polymorphisms (SNPs) associated with quantitative traits (QTs) of brain structure or function. More recently, to address the problem of multiple comparison and weak detection, multivariate analysis methods such as the least absolute shrinkage and selection operator (Lasso) are often used to select the most relevant SNPs associated with QTs. However, one problem of Lasso, as well as many other feature selection methods for imaging genetics, is that some useful prior information, e.g., the hierarchical structure among SNPs, are rarely used for designing a more powerful model. In this paper, we propose to identify the associations between candidate genetic features (i.e., SNPs) and magnetic resonance imaging (MRI)-derived measures using a tree-guided sparse learning (TGSL) method. The advantage of our method is that it explicitly models the complex hierarchical structure among the SNPs in the objective function for feature selection. Specifically, motivated by the biological knowledge, the hierarchical structures involving gene groups and linkage disequilibrium (LD) blocks as well as individual SNPs are imposed as a tree-guided regularization term in our TGSL model. Experimental studies on simulation data and the Alzheimer's Disease Neuroimaging Initiative (ADNI) data show that our method not only achieves better predictions than competing methods on the MRI-derived measures of AD-related region of interests (ROIs) (i.e., hippocampus, parahippocampal gyrus, and precuneus), but also identifies sparse SNP patterns at the block level to better guide the biological interpretation.
影像遗传学在最近的研究中引起了广泛关注。传统的工作主要集中在多元统计方法上,这些方法可以识别与大脑结构或功能的定量特征(QTs)相关的重要单核苷酸多态性(SNPs)。最近,为了解决多重比较和弱检测的问题,经常使用多元分析方法,如最小绝对收缩和选择算子(Lasso),来选择与 QTs 最相关的 SNPs。然而,Lasso 以及影像遗传学中许多其他特征选择方法的一个问题是,很少使用一些有用的先验信息,例如 SNPs 之间的层次结构,来设计更强大的模型。在本文中,我们提出了一种使用树引导稀疏学习(TGSL)方法来识别候选遗传特征(即 SNPs)与磁共振成像(MRI)衍生测量值之间的关联的方法。我们的方法的优点是,它在特征选择的目标函数中明确地对 SNPs 之间的复杂层次结构进行建模。具体来说,受生物学知识的启发,我们的 TGSL 模型将涉及基因组和连锁不平衡(LD)块以及单个 SNPs 的层次结构作为树引导的正则化项来施加。在模拟数据和阿尔茨海默病神经影像学倡议(ADNI)数据上的实验研究表明,我们的方法不仅在与 AD 相关的感兴趣区域(ROIs)(即海马体、旁海马回和楔前叶)的 MRI 衍生测量值上比竞争方法取得了更好的预测效果,而且还在块水平上识别出稀疏的 SNP 模式,以更好地指导生物学解释。