Zhang Meng, Parker Joel, An Lingling, Liu Yiwen, Sun Xiaoxiao
Department of Mathematics, University of Arizona, 617 N. Santa Rita Ave., Tucson, AZ, 85721, USA.
Department of Epidemiology and Biostatistics, University of Arizona, 1295 N. Martin Ave., Tucson, AZ, 85721, USA.
BMC Bioinformatics. 2025 Jan 31;26(1):35. doi: 10.1186/s12859-025-06054-y.
Spatial transcriptomics is a state-of-art technique that allows researchers to study gene expression patterns in tissues over the spatial domain. As a result of technical limitations, the majority of spatial transcriptomics techniques provide bulk data for each sequencing spot. Consequently, in order to obtain high-resolution spatial transcriptomics data, performing deconvolution becomes essential. Most existing deconvolution methods rely on reference data (e.g., single-cell data), which may not be available in real applications. Current reference-free methods encounter limitations due to their dependence on distribution assumptions, reliance on marker genes, or the absence of leveraging histology and spatial information. Consequently, there is a critical need for the development of highly flexible, robust, and user-friendly reference-free deconvolution methods capable of unifying or leveraging case-specific information in the analysis of spatial transcriptomics data.
We propose a novel reference-free method based on regularized non-negative matrix factorization (NMF), named Flexible Analysis of Spatial Transcriptomics (FAST), that can effectively incorporate gene expression data, spatial, and histology information into a unified deconvolution framework. Compared to existing methods, FAST imposes fewer distribution assumptions, utilizes the spatial structure information of tissues, and encourages interpretable factorization results. These features enable greater flexibility and accuracy, making FAST an effective tool for deciphering the complex cell-type composition of tissues and advancing our understanding of various biological processes and diseases. Extensive simulation studies have shown that FAST outperforms other existing reference-free methods. In real data applications, FAST is able to uncover the underlying tissue structures and identify the corresponding marker genes.
空间转录组学是一种先进技术,使研究人员能够在空间域研究组织中的基因表达模式。由于技术限制,大多数空间转录组学技术为每个测序点提供批量数据。因此,为了获得高分辨率的空间转录组学数据,进行反卷积变得至关重要。大多数现有的反卷积方法依赖于参考数据(例如单细胞数据),而在实际应用中可能无法获得这些数据。当前的无参考方法由于依赖分布假设、依赖标记基因或缺乏利用组织学和空间信息而受到限制。因此,迫切需要开发高度灵活、稳健且用户友好的无参考反卷积方法,能够在空间转录组学数据分析中统一或利用特定病例信息。
我们提出了一种基于正则化非负矩阵分解(NMF)的新型无参考方法,名为空间转录组学灵活分析(FAST),它可以有效地将基因表达数据、空间和组织学信息纳入统一的反卷积框架。与现有方法相比,FAST施加的分布假设更少,利用了组织的空间结构信息,并鼓励可解释的分解结果。这些特性使其具有更高的灵活性和准确性,使FAST成为解读组织复杂细胞类型组成以及推进我们对各种生物过程和疾病理解的有效工具。广泛的模拟研究表明,FAST优于其他现有的无参考方法。在实际数据应用中,FAST能够揭示潜在的组织结构并识别相应的标记基因。