Friedman Sam Freesun, Moran Gemma Elyse, Rakic Marianne, Phillipakis Anthony
Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
Statistics Department, Rutgers University, New Brunswick, NJ, USA.
Bioinform Biol Insights. 2024 Sep 28;18:11779322241282489. doi: 10.1177/11779322241282489. eCollection 2024.
The advent of biobanks with vast quantities of medical imaging and paired genetic measurements creates huge opportunities for a new generation of genotype-phenotype association studies. However, disentangling biological signals from the many sources of bias and artifacts remains difficult. Using diverse medical images and time-series (ie, magnetic resonance imagings [MRIs], electrocardiograms [ECGs], and dual-energy X-ray absorptiometries [DXAs]), we show how registration, both spatial and temporal, guided by domain knowledge or learned , helps uncover biological information. A multimodal autoencoder comparison framework quantifies and characterizes how registration affects the representations that unsupervised and self-supervised encoders learn. In this study we (1) train autoencoders before and after registration with nine diverse types of medical image, (2) demonstrate how neural network-based methods (VoxelMorph, DeepCycle, and DropFuse) can effectively learn registrations allowing for more flexible and efficient processing than is possible with hand-crafted registration techniques, and (3) conduct exhaustive phenotypic screening, comprised of millions of statistical tests, to quantify how registration affects the generalizability of learned representations. Genome- and phenome-wide association studies (GWAS and PheWAS) uncover significantly more associations with registered modality representations than with equivalently trained and sized representations learned from native coordinate spaces. Specifically, registered PheWAS yielded 61 more disease associations for ECGs, 53 more disease associations for cardiac MRIs, and 10 more disease associations for brain MRIs. Registration also yields significant increases in the coefficient of determination when regressing continuous phenotypes (eg, 0.36 ± 0.01 with ECGs and 0.11 ± 0.02 for DXA scans). Our findings reveal the crucial role registration plays in enhancing the characterization of physiological states across a broad range of medical imaging data types. Importantly, this finding extends to more flexible types of registration, such as the cross-modal and the circular mapping methods presented here.
拥有大量医学影像和配对基因测量数据的生物样本库的出现,为新一代基因型-表型关联研究创造了巨大机遇。然而,将生物信号与众多偏差和伪影来源区分开来仍然很困难。我们使用多种医学影像和时间序列(即磁共振成像[MRI]、心电图[ECG]和双能X线吸收测定法[DXA]),展示了在领域知识或学习引导下的空间和时间配准如何有助于揭示生物信息。一个多模态自动编码器比较框架量化并表征了配准如何影响无监督和自监督编码器学习到的表示。在本研究中,我们(1)在配准前后使用九种不同类型的医学影像训练自动编码器,(2)证明基于神经网络的方法(VoxelMorph、DeepCycle和DropFuse)如何能够有效地学习配准,从而实现比手工制作的配准技术更灵活、高效的处理,以及(3)进行由数百万次统计测试组成的详尽表型筛选,以量化配准如何影响学习到的表示的泛化能力。全基因组和全表型关联研究(GWAS和PheWAS)发现,与从原始坐标空间学习到的同等训练和规模的表示相比,配准后的模态表示能揭示出显著更多的关联。具体而言,配准后的PheWAS对心电图产生了61个更多的疾病关联,对心脏MRI产生了53个更多的疾病关联,对脑MRI产生了10个更多的疾病关联。在对连续表型进行回归时,配准还显著提高了决定系数(例如,心电图为0.36±0.01,DXA扫描为0.11±0.02)。我们的研究结果揭示了配准在增强对广泛医学影像数据类型的生理状态表征方面所起的关键作用。重要的是,这一发现扩展到了更灵活的配准类型,如本文介绍的跨模态和循环映射方法。