Department of Computer Science, Princeton University, Princeton, NJ, USA.
Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA.
Nat Commun. 2021 Mar 11;12(1):1609. doi: 10.1038/s41467-021-21727-x.
Histopathological images are used to characterize complex phenotypes such as tumor stage. Our goal is to associate features of stained tissue images with high-dimensional genomic markers. We use convolutional autoencoders and sparse canonical correlation analysis (CCA) on paired histological images and bulk gene expression to identify subsets of genes whose expression levels in a tissue sample correlate with subsets of morphological features from the corresponding sample image. We apply our approach, ImageCCA, to two TCGA data sets, and find gene sets associated with the structure of the extracellular matrix and cell wall infrastructure, implicating uncharacterized genes in extracellular processes. We find sets of genes associated with specific cell types, including neuronal cells and cells of the immune system. We apply ImageCCA to the GTEx v6 data, and find image features that capture population variation in thyroid and in colon tissues associated with genetic variants (image morphology QTLs, or imQTLs), suggesting that genetic variation regulates population variation in tissue morphological traits.
组织病理学图像用于描述复杂的表型,如肿瘤分期。我们的目标是将染色组织图像的特征与高维基因组标记相关联。我们使用卷积自动编码器和稀疏典型相关分析(CCA)对配对的组织学图像和批量基因表达进行分析,以识别在组织样本中表达水平与相应样本图像的形态特征子集相关的基因子集。我们将我们的方法 ImageCCA 应用于两个 TCGA 数据集,并找到了与细胞外基质结构和细胞壁基础设施相关的基因集,提示细胞外过程中存在未被描述的基因。我们找到了与特定细胞类型相关的基因集,包括神经元细胞和免疫系统细胞。我们将 ImageCCA 应用于 GTEx v6 数据,并找到了可以捕获甲状腺和结肠组织中与遗传变异(图像形态学 QTL 或 imQTL)相关的种群变化的图像特征,这表明遗传变异调节组织形态特征的种群变化。