IEEE Trans Med Imaging. 2018 Dec;37(12):2561-2571. doi: 10.1109/TMI.2017.2721301. Epub 2017 Jun 28.
Among the challenges arising in brain imaging genetic studies, estimating the potential links between neurological and genetic variability within a population is key. In this paper, we propose a multivariate, multimodal formulation for variable selection that leverages co-expression patterns across various data modalities. Our approach is based on an intuitive combination of two widely used statistical models: sparse regression and canonical correlation analysis (CCA). While the former seeks multivariate linear relationships between a given phenotype and associated observations, the latter searches to extract co-expression patterns between sets of variables belonging to different modalities. In the following, we propose to rely on a "CCA-type" formulation in order to regularize the classical multimodal sparse regression problem (essentially incorporating both CCA and regression models within a unified formulation). The underlying motivation is to extract discriminative variables that are also co-expressed across modalities. We first show that the simplest formulation of such model can be expressed as a special case of collaborative learning methods. After discussing its limitation, we propose an extended, more flexible formulation, and introduce a simple and efficient alternating minimization algorithm to solve the associated optimization problem. We explore the parameter space and provide some guidelines regarding parameter selection. Both the original and extended versions are then compared on a simple toy data set and a more advanced simulated imaging genomics data set in order to illustrate the benefits of the latter. Finally, we validate the proposed formulation using single nucleotide polymorphisms data and functional magnetic resonance imaging data from a population of adolescents ( subjects, age 16.9 ± 1.9 years from the Philadelphia Neurodevelopmental Cohort) for the study of learning ability. Furthermore, we carry out a significance analysis of the resulting features that allow us to carefully extract brain regions and genes linked to learning and cognitive ability.
在脑影像遗传学研究中出现的诸多挑战中,估算人群中神经和遗传变异性之间的潜在联系是关键。在本文中,我们提出了一种多变量、多模态的变量选择方法,利用了各种数据模态之间的共表达模式。我们的方法基于两种广泛使用的统计模型的直观组合:稀疏回归和典型相关分析(CCA)。前者寻求给定表型和相关观测之间的多元线性关系,后者则旨在提取属于不同模态的变量集之间的共表达模式。在下面,我们建议依赖于“CCA 型”公式,以便对经典的多模态稀疏回归问题进行正则化(本质上是在统一的公式中同时包含 CCA 和回归模型)。其基本动机是提取具有跨模态共表达的判别变量。我们首先表明,该模型的最简单公式可以表示为协作学习方法的一个特例。在讨论其局限性之后,我们提出了一个扩展的、更灵活的公式,并引入了一个简单而有效的交替最小化算法来解决相关的优化问题。我们探索了参数空间,并提供了一些关于参数选择的指导原则。然后,我们在一个简单的玩具数据集和一个更先进的模拟影像基因组学数据集上比较了原始和扩展版本,以说明后者的优势。最后,我们使用来自费城神经发育队列的青少年群体( subjects,年龄 16.9±1.9 岁)的单核苷酸多态性数据和功能磁共振成像数据验证了所提出的公式。此外,我们还对得到的特征进行了显著性分析,这使我们能够仔细提取与学习和认知能力相关的大脑区域和基因。