Deprez Marie, Moreira Julien, Sermesant Maxime, Lorenzi Marco
University of Côte d'Azur, Nice, France.
INRIA, Epione Project-Team, Valbonne, France.
Front Mol Med. 2022 Mar 30;2:830956. doi: 10.3389/fmmed.2022.830956. eCollection 2022.
The applicability of multivariate approaches for the joint analysis of genomics and phenomics information is currently limited by the lack of scalability, and by the difficulty of interpreting the related findings from a biological perspective. To tackle these limitations, we present Bayesian Genome-to-Phenome Sparse Regression (G2PSR), a novel multivariate regression method based on sparse SNP-gene constraints. The statistical framework of G2PSR is based on a Bayesian neural network, were constraints on SNPs-genes associations are integrated by incorporating knowledge linking variants to their respective genes, to then reconstruct the phenotypic data in the output layer. Interpretability is promoted by inducing sparsity on the genes through variational dropout, allowing to estimate the uncertainty associated with each gene, and related SNPs, in the reconstruction task. Ultimately, G2PSR is conceived to prevent multiple testing correction and to assess the combined effect of SNPs, thus increasing the statistical power in detecting genome-to-phenome associations. The effectiveness of G2PSR was demonstrated on synthetic and real data, with respect to state-of-the-art methods based on group-wise sparsity constraints. The application on real data consisted in an imaging-genetics analysis on the Alzheimer's Disease Neuroimaging Initiative data, relating SNPs from more than 3,500 genes to clinical and multi-variate brain volumetric information. The experimental results show that our method can provide accurate selection of relevant genes in dataset with large SNPs-to-samples ratio, thus overcoming the main limitations of current genome-to-phenome association methods.
目前,多变量方法在基因组学和表型组学信息联合分析中的适用性受到缺乏可扩展性以及从生物学角度解释相关发现的困难的限制。为了解决这些限制,我们提出了贝叶斯基因组到表型组稀疏回归(G2PSR),这是一种基于稀疏单核苷酸多态性(SNP)-基因约束的新型多变量回归方法。G2PSR的统计框架基于贝叶斯神经网络,通过纳入将变异与其各自基因联系起来的知识,整合对SNP-基因关联的约束,然后在输出层重建表型数据。通过变分失活在基因上引入稀疏性来促进可解释性,从而能够在重建任务中估计与每个基因以及相关SNP相关的不确定性。最终,G2PSR旨在避免多重检验校正并评估SNP的联合效应,从而提高检测基因组到表型组关联的统计能力。相对于基于组稀疏性约束的现有方法,G2PSR在合成数据和真实数据上都证明了其有效性。在真实数据上的应用包括对阿尔茨海默病神经影像倡议数据进行影像遗传学分析,将来自3500多个基因的SNP与临床和多变量脑容量信息相关联。实验结果表明,我们的方法能够在SNP与样本比例较大的数据集中准确选择相关基因,从而克服了当前基因组到表型组关联方法的主要局限性。