McWilliams School of Biomedical Informatics, University of Texas Health Science Center, Houston, TX, 77030, USA.
Department of Computer Science and Engineering, Texas A&M University, College Station, TX, 77843, USA.
Commun Biol. 2024 Apr 5;7(1):414. doi: 10.1038/s42003-024-06096-7.
Understanding the genetic architecture of brain structure is challenging, partly due to difficulties in designing robust, non-biased descriptors of brain morphology. Until recently, brain measures for genome-wide association studies (GWAS) consisted of traditionally expert-defined or software-derived image-derived phenotypes (IDPs) that are often based on theoretical preconceptions or computed from limited amounts of data. Here, we present an approach to derive brain imaging phenotypes using unsupervised deep representation learning. We train a 3-D convolutional autoencoder model with reconstruction loss on 6130 UK Biobank (UKBB) participants' T1 or T2-FLAIR (T2) brain MRIs to create a 128-dimensional representation known as Unsupervised Deep learning derived Imaging Phenotypes (UDIPs). GWAS of these UDIPs in held-out UKBB subjects (n = 22,880 discovery and n = 12,359/11,265 replication cohorts for T1/T2) identified 9457 significant SNPs organized into 97 independent genetic loci of which 60 loci were replicated. Twenty-six loci were not reported in earlier T1 and T2 IDP-based UK Biobank GWAS. We developed a perturbation-based decoder interpretation approach to show that these loci are associated with UDIPs mapped to multiple relevant brain regions. Our results established unsupervised deep learning can derive robust, unbiased, heritable, and interpretable brain imaging phenotypes.
理解大脑结构的遗传结构具有挑战性,部分原因是难以设计稳健、无偏的大脑形态描述符。直到最近,全基因组关联研究(GWAS)的大脑测量方法还包括传统上由专家定义或软件衍生的图像衍生表型(IDP),这些表型通常基于理论假设或从有限数量的数据计算得出。在这里,我们提出了一种使用无监督深度表示学习来推导脑影像表型的方法。我们使用重建损失在 6130 名 UK Biobank(UKBB)参与者的 T1 或 T2-FLAIR(T2)脑 MRI 上训练一个 3D 卷积自动编码器模型,以创建一个 128 维的表示,称为无监督深度学习衍生的影像表型(UDIP)。在 UKBB 中保留的受试者(n=22880 个发现和 n=12359/11265 个复制队列,用于 T1/T2)中对这些 UDIP 进行 GWAS,确定了 9457 个显著的 SNP,这些 SNP 组织成 97 个独立的遗传位点,其中 60 个位点得到了复制。26 个位点在早期的 T1 和 T2 IDP 基于的 UK Biobank GWAS 中没有报道。我们开发了一种基于扰动的解码器解释方法,表明这些位点与映射到多个相关脑区的 UDIPs 相关。我们的结果表明,无监督深度学习可以推导出稳健、无偏、可遗传和可解释的脑影像表型。