Benjamin Ari S, Zador Anthony
bioRxiv. 2025 Aug 19:2025.08.17.670735. doi: 10.1101/2025.08.17.670735.
Single-cell RNA sequencing technologies have enabled unprecedented insights into gene expression and are poised to transform clinical diagnostics. At present, most computational approaches for interpreting single-cell data operate at the level of individual cells, predicting labels or properties based on isolated transcriptomic profiles. This approach overlooks a key class of signals: the composition of cells within a sample or defined population. Such signals are often critical for inferring tissue identity, disease state, or other sample-level phenotypes. To address this limitation, we introduce TissueFormer, a Transformer-based neural network that analyzes groups of single-cell RNA profiles to infer population-level labels while retaining single-cell resolution. Applied to predict the cortical area of groups of cells sampled from spatial transcriptomic data from mouse brains, TissueFormer outperformed both single-cell foundation models and machine learning methods applied to pseudobulk and cell type composition. This higher performance enables the automated construction of high-resolution brain region maps in individual animals directly from spatial transcriptomic data. More broadly, TissueFormer provides a framework for predicting any population-level phenotypes which are influenced by cellular diversity and tissue-level organization.
单细胞RNA测序技术使人们对基因表达有了前所未有的深入了解,并有望改变临床诊断。目前,大多数用于解释单细胞数据的计算方法都是在单个细胞层面上运行的,根据孤立的转录组图谱预测标签或属性。这种方法忽略了一类关键信号:样本或特定群体内细胞的组成。此类信号对于推断组织身份、疾病状态或其他样本层面的表型通常至关重要。为解决这一局限性,我们引入了TissueFormer,这是一种基于Transformer的神经网络,它分析单细胞RNA图谱组以推断群体层面的标签,同时保留单细胞分辨率。应用于从小鼠大脑的空间转录组数据中采样的细胞组的皮质区域预测时,TissueFormer的表现优于单细胞基础模型以及应用于伪批量和细胞类型组成的机器学习方法。这种更高的性能使得能够直接从空间转录组数据自动构建个体动物的高分辨率脑区图谱。更广泛地说,TissueFormer提供了一个框架,用于预测任何受细胞多样性和组织层面组织影响的群体层面的表型。