Wang Linjie, Zhang Huixia, Yi Bo, Xie Weidong, Yu Kun, Li Wei, Li Keqin, Zhao Dazhe
School of Computer Science and Engineering, Northeastern University, 110819, Shenyang, China.
College of Medicine and Bioinformation Engineering, Northeastern University, 110819, Shenyang, China.
Brief Bioinform. 2025 Mar 4;26(2). doi: 10.1093/bib/bbaf157.
Single-cell multi-omics technologies have revolutionized the study of cell states and functions by simultaneously profiling multiple molecular layers within individual cells. However, existing methods for integrating these data struggle to preserve critical feature information and fail to exploit known regulatory knowledge, which is essential for understanding cell functions. This limitation hinders their ability to provide comprehensive and accurate insights into cells. Here, we propose FactVAE, an innovative factorized variational autoencoder designed for the robust and accurate understanding of single-cell multi-omics data. FactVAE integrates the factorization principle into the variational autoencoder framework, ensuring the preservation of feature information while leveraging the non-linear capture of sample information by neural networks. Additionally, known regulatory knowledge is incorporated during model training, and a knowledge transfer strategy is employed for cell embedding optimization and data augmentation. Comparative analyses of single-cell multi-omics datasets from different protocols and the spatial multi-omics dataset demonstrate that FactVAE not only outperforms benchmark methods in clustering performance but also generates augmented data that reveals the clearest cell-type-specific motif expression. Moreover, the feature embeddings captured by FactVAE enable the inference of potential and reliable gene regulatory relationships. Overall, FactVAE's superior performance and strong scalability make it a promising new solution for single-cell multi-omics data analysis.
单细胞多组学技术通过同时对单个细胞内的多个分子层面进行分析,彻底改变了细胞状态和功能的研究。然而,现有的整合这些数据的方法难以保留关键特征信息,并且未能利用已知的调控知识,而这对于理解细胞功能至关重要。这种局限性阻碍了它们全面、准确洞察细胞的能力。在此,我们提出了FactVAE,这是一种创新的因子分解变分自编码器,旨在稳健而准确地理解单细胞多组学数据。FactVAE将因子分解原理整合到变分自编码器框架中,在利用神经网络对样本信息进行非线性捕捉的同时,确保特征信息的保留。此外,在模型训练过程中纳入已知的调控知识,并采用知识转移策略进行细胞嵌入优化和数据增强。对来自不同方案的单细胞多组学数据集和空间多组学数据集的比较分析表明,FactVAE不仅在聚类性能上优于基准方法,还能生成揭示最清晰细胞类型特异性基序表达的增强数据。此外,FactVAE捕捉到的特征嵌入能够推断潜在且可靠的基因调控关系。总体而言,FactVAE的卓越性能和强大的可扩展性使其成为单细胞多组学数据分析中一个很有前景的新解决方案。