Cutaneous Biology Research Center, Massachusetts General Hospital, Harvard Medical School, Boston, MA 02129, USA.
Department of Computer Science, City University of Hong Kong, Kowloog Tong 999077, Hong Kong SAR.
Bioinformatics. 2024 Jun 3;40(6). doi: 10.1093/bioinformatics/btae343.
Tissue context and molecular profiling are commonly used measures in understanding normal development and disease pathology. In recent years, the development of spatial molecular profiling technologies (e.g. spatial resolved transcriptomics) has enabled the exploration of quantitative links between tissue morphology and gene expression. However, these technologies remain expensive and time-consuming, with subsequent analyses necessitating high-throughput pathological annotations. On the other hand, existing computational tools are limited to predicting only a few dozen to several hundred genes, and the majority of the methods are designed for bulk RNA-seq.
In this context, we propose HE2Gene, the first multi-task learning-based method capable of predicting tens of thousands of spot-level gene expressions along with pathological annotations from H&E-stained images. Experimental results demonstrate that HE2Gene is comparable to state-of-the-art methods and generalizes well on an external dataset without the need for re-training. Moreover, HE2Gene preserves the annotated spatial domains and has the potential to identify biomarkers. This capability facilitates cancer diagnosis and broadens its applicability to investigate gene-disease associations.
The source code and data information has been deposited at https://github.com/Microbiods/HE2Gene.
组织背景和分子分析通常用于理解正常发育和疾病病理学。近年来,空间分子分析技术(如空间分辨转录组学)的发展使得探索组织形态和基因表达之间的定量关系成为可能。然而,这些技术仍然昂贵且耗时,随后的分析需要高通量的病理注释。另一方面,现有的计算工具仅限于预测几十到几百个基因,而且大多数方法都是为批量 RNA-seq 设计的。
在这种情况下,我们提出了 HE2Gene,这是第一个基于多任务学习的方法,能够从 H&E 染色图像中预测数千个斑点级别的基因表达以及病理注释。实验结果表明,HE2Gene 与最先进的方法相当,并且在无需重新训练的情况下可以很好地推广到外部数据集。此外,HE2Gene 保留了注释的空间域,并且有可能识别生物标志物。这种能力有助于癌症诊断,并扩大了其在研究基因疾病关联方面的适用性。
源代码和数据信息已存储在 https://github.com/Microbiods/HE2Gene。