Niu Jinyun, Zhu Fangfang, Xu Taosheng, Wang Shunfang, Min Wenwen
School of Information Science and Engineering, Yunnan University, Kunming, 650091, Yunnan, China.
School of Health and Nursing, Yunnan Open University, Kunming, 650599, Yunnan, China.
Comput Struct Biotechnol J. 2024 Dec 2;23:4369-4383. doi: 10.1016/j.csbj.2024.11.041. eCollection 2024 Dec.
The rapid development of spatial transcriptomics (ST) technology has provided unprecedented opportunities to understand tissue relationships and functions within specific spatial contexts. Accurate identification of spatial domains is crucial for downstream spatial transcriptomics analysis. However, effectively combining gene expression data, histological images and spatial coordinate data to identify spatial domains remains a challenge. To this end, we propose STMVGAE, a novel spatial transcriptomics analysis tool that combines a multi-view variational graph autoencoder with a consensus clustering framework. STMVGAE begins by extracting histological images features using a pre-trained convolutional neural network (CNN) and integrates these features with gene expression data to generate augmented gene expression profiles. Subsequently, multiple graphs (views) are constructed using various similarity measures, capturing different aspects of the spatial and transcriptional relationships. These views, combined with the augmented gene expression data, are then processed through variational graph auto-encoders (VGAEs) to learn multiple low-dimensional latent embeddings. Finally, the model employs a consensus clustering method to integrate the clustering results derived from these embeddings, significantly improving clustering accuracy and stability. We applied STMVGAE to five real datasets and compared it with five state-of-the-art methods, showing that STMVGAE consistently achieves competitive results. We assessed its capabilities in spatial domain identification and evaluated its performance across various downstream tasks, including UMAP visualization, PAGA trajectory inference, spatially variable gene (SVG) identification, denoising, batch integration, and other analyses. All code and public datasets used in this paper is available at https://github.com/wenwenmin/STMVGAE and https://zenodo.org/records/13119867.
空间转录组学(ST)技术的快速发展为在特定空间背景下理解组织关系和功能提供了前所未有的机会。准确识别空间域对于下游空间转录组学分析至关重要。然而,有效地整合基因表达数据、组织学图像和空间坐标数据以识别空间域仍然是一个挑战。为此,我们提出了STMVGAE,这是一种新颖的空间转录组学分析工具,它将多视图变分图自动编码器与共识聚类框架相结合。STMVGAE首先使用预训练的卷积神经网络(CNN)提取组织学图像特征,并将这些特征与基因表达数据整合以生成增强的基因表达谱。随后,使用各种相似性度量构建多个图(视图),捕捉空间和转录关系的不同方面。然后,将这些视图与增强的基因表达数据相结合,通过变分图自动编码器(VGAE)进行处理,以学习多个低维潜在嵌入。最后,该模型采用共识聚类方法整合从这些嵌入中得出的聚类结果,显著提高聚类的准确性和稳定性。我们将STMVGAE应用于五个真实数据集,并与五种先进方法进行比较,结果表明STMVGAE始终能取得具有竞争力的结果。我们评估了它在空间域识别方面的能力,并评估了其在各种下游任务中的性能,包括UMAP可视化、PAGA轨迹推断、空间可变基因(SVG)识别、去噪、批次整合及其他分析。本文使用的所有代码和公共数据集可在https://github.com/wenwenmin/STMVGAE和https://zenodo.org/records/13119867获取。