School of Computer Science and Engineering, Central South University, 410075, Changsha, China.
School of Computer Science and Technology, Nanjing Tech University, 211816, Nanjing, China.
Brief Bioinform. 2022 Sep 20;23(5). doi: 10.1093/bib/bbac294.
Digital pathological analysis is run as the main examination used for cancer diagnosis. Recently, deep learning-driven feature extraction from pathology images is able to detect genetic variations and tumor environment, but few studies focus on differential gene expression in tumor cells.
In this paper, we propose a self-supervised contrastive learning framework, HistCode, to infer differential gene expression from whole slide images (WSIs). We leveraged contrastive learning on large-scale unannotated WSIs to derive slide-level histopathological features in latent space, and then transfer it to tumor diagnosis and prediction of differentially expressed cancer driver genes. Our experiments showed that our method outperformed other state-of-the-art models in tumor diagnosis tasks, and also effectively predicted differential gene expression. Interestingly, we found the genes with higher fold change can be more precisely predicted. To intuitively illustrate the ability to extract informative features from pathological images, we spatially visualized the WSIs colored by the attention scores of image tiles. We found that the tumor and necrosis areas were highly consistent with the annotations of experienced pathologists. Moreover, the spatial heatmap generated by lymphocyte-specific gene expression patterns was also consistent with the manually labeled WSIs.
数字病理学分析作为癌症诊断的主要检查手段。最近,基于深度学习的病理学图像特征提取能够检测遗传变异和肿瘤环境,但很少有研究关注肿瘤细胞中的差异基因表达。
在本文中,我们提出了一种自监督对比学习框架 HistCode,用于从全切片图像(WSI)推断差异基因表达。我们利用大规模无注释 WSI 上的对比学习在潜在空间中得出幻灯片级别的组织病理学特征,然后将其转移到肿瘤诊断和差异表达的癌症驱动基因预测中。我们的实验表明,我们的方法在肿瘤诊断任务中优于其他最先进的模型,并且能够有效地预测差异基因表达。有趣的是,我们发现折叠变化较高的基因可以更准确地预测。为了直观地说明从病理图像中提取信息特征的能力,我们通过图像块的注意力得分对 WSI 进行了空间可视化。我们发现肿瘤和坏死区域与经验丰富的病理学家的注释高度一致。此外,淋巴细胞特异性基因表达模式生成的空间热图也与手动标记的 WSI 一致。