一种将组织病理学图像与转录组学相连接的视觉组学基础模型。
A visual-omics foundation model to bridge histopathology image with transcriptomics.
作者信息
Chen Weiqing, Zhang Pengzhi, Tran Tu N, Xiao Yiwei, Li Shengyu, Shah Vrutant V, Cheng Hao, Brannan Kristopher W, Youker Keith, Li Lai, Fang Longhou, Yang Yu, Le Nhat-Tu, Abe Jun-Ichi, Chen Shu-Hsia, Ma Qin, Chen Ken, Song Qianqian, Cooke John P, Wang Guangyu
机构信息
Center for Bioinformatics and Computational Biology, Houston Methodist Research Institute, Houston, TX, 77030, USA.
Department of Physiology, Biophysics & Systems Biology, Weill Cornell Graduate School of Medical Science, Cornell University, New York, NY, 10065, USA.
出版信息
Res Sq. 2025 Apr 16:rs.3.rs-5183775. doi: 10.21203/rs.3.rs-5183775/v1.
Artificial intelligence has revolutionized computational biology. Recent developments in omics technologies, including single-cell RNA sequencing (scRNA-seq) and spatial transcriptomics (ST), provide detailed genomic data alongside tissue histology. However, current computational models focus on either omics or image analysis, lacking their integration. To address this, we developed OmiCLIP, a visual-omics foundation model linking hematoxylin and eosin (H&E) images and transcriptomics using tissue patches from Visium data. We transformed transcriptomic data into "sentences" by concatenating top-expressed gene symbols from each patch. We curated a dataset of 2.2 million paired tissue images and transcriptomic data across 32 organs to train OmiCLIP integrating histology and transcriptomics. Building on OmiCLIP, our Loki platform offers five key functions: tissue alignment, annotation via bulk RNA-seq or marker genes, cell type decomposition, image-transcriptomics retrieval, and ST gene expression prediction from H&E images. Compared with 22 state-of-the-art models on 5 simulations, 19 public, and 4 in-house experimental datasets, Loki demonstrated consistent accuracy and robustness.
人工智能已经彻底改变了计算生物学。组学技术的最新进展,包括单细胞RNA测序(scRNA-seq)和空间转录组学(ST),在提供组织组织学信息的同时,还能提供详细的基因组数据。然而,目前的计算模型要么专注于组学,要么专注于图像分析,缺乏两者的整合。为了解决这个问题,我们开发了OmiCLIP,这是一个视觉组学基础模型,它使用来自Visium数据的组织切片将苏木精和伊红(H&E)图像与转录组学联系起来。我们通过串联每个切片中表达量最高的基因符号,将转录组数据转化为“句子”。我们精心策划了一个包含220万个配对组织图像和来自32个器官的转录组数据的数据集,以训练整合组织学和转录组学的OmiCLIP。基于OmiCLIP,我们的Loki平台提供五项关键功能:组织对齐、通过批量RNA测序或标记基因进行注释、细胞类型分解、图像-转录组学检索以及从H&E图像预测ST基因表达。在5个模拟数据集、19个公共数据集和4个内部实验数据集上,与22个最先进的模型相比,Loki表现出了一致的准确性和稳健性。