Srinivasan Gokul, Le Minh-Khang, Azher Zarif, Liu Xiaoying, Vaickus Louis, Kaur Harsimran, Kolling Fred, Palisoul Scott, Perreard Laurent, Lau Ken S, Yao Keluo, Levy Joshua
Departments of Pathology and Laboratory Medicine and Computational Biomedicine, Cedars-Sinai Medical Center, Los Angeles, CA 90048.
California Institute of Technology, Pasadena, CA, 91125.
medRxiv. 2025 Apr 23:2025.04.22.25326170. doi: 10.1101/2025.04.22.25326170.
Colorectal cancer (CRC) remains a major health concern, with over 150,000 new diagnoses and more than 50,000 deaths annually in the United States, underscoring an urgent need for improved screening, prognostication, disease management, and therapeutic approaches. The tumor microenvironment (TME)-comprising cancerous and immune cells interacting within the tumor's spatial architecture-plays a critical role in disease progression and treatment outcomes, reinforcing its importance as a prognostic marker for metastasis and recurrence risk. However, traditional methods for TME characterization, such as bulk transcriptomics and multiplex protein assays, lack sufficient spatial resolution. Although spatial transcriptomics (ST) allows for the high-resolution mapping of whole transcriptomes at near-cellular resolution, current ST technologies (e.g., Visium, Xenium) are limited by high costs, low throughput, and issues with reproducibility, preventing their widespread application in large-scale molecular epidemiology studies. In this study, we refined and implemented Virtual RNA Inference (VRI) to derive ST-level molecular information directly from hematoxylin and eosin (H&E)-stained tissue images. Our VRI models were trained on the largest matched CRC ST dataset to date, comprising 45 patients and more than 300,000 Visium spots from primary tumors. Using state-of-the-art architectures (UNI, ResNet-50, ViT, and VMamba), we achieved a median Spearman's correlation coefficient of 0.546 between predicted and measured spot-level expression. As validation, VRI-derived gene signatures linked to specific tissue regions (tumor, interface, submucosa, stroma, serosa, muscularis, inflammation) showed strong concordance with signatures generated via direct ST, and VRI performed accurately in estimating cell-type proportions spatially from H&E slides. In an expanded CRC cohort controlling for tumor invasiveness and clinical factors, we further identified VRI-derived gene signatures significantly associated with key prognostic outcomes, including metastasis status. Although certain tumor-related pathways are not fully captured by histology alone, our findings highlight the ability of VRI to infer a wide range of "histology-associated" biological pathways at near-cellular resolution without requiring ST profiling. Future efforts will extend this framework to expand TME phenotyping from standard H&E tissue images, with the potential to accelerate translational CRC research at scale.
结直肠癌(CRC)仍然是一个重大的健康问题,在美国,每年有超过15万例新诊断病例和超过5万例死亡病例,这凸显了迫切需要改进筛查、预后评估、疾病管理和治疗方法。肿瘤微环境(TME)由在肿瘤空间结构内相互作用的癌细胞和免疫细胞组成,在疾病进展和治疗结果中起着关键作用,这进一步强化了其作为转移和复发风险预后标志物的重要性。然而,传统的TME表征方法,如批量转录组学和多重蛋白质检测,缺乏足够的空间分辨率。虽然空间转录组学(ST)能够以近细胞分辨率对整个转录组进行高分辨率映射,但目前的ST技术(如Visium、Xenium)受到高成本、低通量和可重复性问题的限制,阻碍了它们在大规模分子流行病学研究中的广泛应用。在本研究中,我们改进并实施了虚拟RNA推理(VRI),以直接从苏木精和伊红(H&E)染色的组织图像中获取ST水平的分子信息。我们的VRI模型在迄今为止最大的匹配CRC ST数据集上进行训练,该数据集包含45名患者和来自原发性肿瘤的30多万个Visium斑点。使用最先进的架构(UNI、ResNet-50、ViT和VMamba),我们在预测的和测量的斑点水平表达之间实现了中位数斯皮尔曼相关系数为0.546。作为验证,与特定组织区域(肿瘤、界面、黏膜下层、基质、浆膜、肌层、炎症)相关的VRI衍生基因特征与通过直接ST生成的特征显示出高度一致性,并且VRI在从H&E玻片上空间估计细胞类型比例方面表现准确。在一个扩大的CRC队列中,控制了肿瘤侵袭性和临床因素,我们进一步确定了VRI衍生的基因特征与关键预后结果,包括转移状态,显著相关。虽然某些肿瘤相关途径不能仅通过组织学完全捕获,但我们的研究结果突出了VRI在近细胞分辨率下推断广泛的“组织学相关”生物学途径的能力,而无需ST分析。未来的努力将扩展这个框架,以从标准H&E组织图像扩展TME表型分析,有可能加速大规模的结直肠癌转化研究。