Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, 430074, China.
University of Chinese Academy of Sciences, Beijing, 100049, China.
Adv Sci (Weinh). 2024 May;11(19):e2307835. doi: 10.1002/advs.202307835. Epub 2024 Mar 14.
Transformer-based models have revolutionized single cell RNA-seq (scRNA-seq) data analysis. However, their applicability is challenged by the complexity and scale of single-cell multi-omics data. Here a novel single-cell multi-modal/multi-task transformer (scmFormer) is proposed to fill up the existing blank of integrating single-cell proteomics with other omics data. Through systematic benchmarking, it is demonstrated that scmFormer excels in integrating large-scale single-cell multimodal data and heterogeneous multi-batch paired multi-omics data, while preserving shared information across batchs and distinct biological information. scmFormer achieves 54.5% higher average F1 score compared to the second method in transferring cell-type labels from single-cell transcriptomics to proteomics data. Using COVID-19 datasets, it is presented that scmFormer successfully integrates over 1.48 million cells on a personal computer. Moreover, it is also proved that scmFormer performs better than existing methods on generating the unmeasured modality and is well-suited for spatial multi-omic data. Thus, scmFormer is a powerful and comprehensive tool for analyzing single-cell multi-omics data.
基于转换器的模型彻底改变了单细胞 RNA 测序 (scRNA-seq) 数据分析。然而,单细胞多组学数据的复杂性和规模对其适用性提出了挑战。在这里,提出了一种新的单细胞多模态/多任务转换器 (scmFormer),以填补将单细胞蛋白质组学与其他组学数据集成的现有空白。通过系统的基准测试,证明了 scmFormer 擅长整合大规模的单细胞多模态数据和异质多批配对多组学数据,同时保留批次之间的共享信息和独特的生物学信息。与将单细胞转录组学中的细胞类型标签转移到蛋白质组学数据的第二种方法相比,scmFormer 的平均 F1 分数提高了 54.5%。使用 COVID-19 数据集,展示了 scmFormer 成功地在个人计算机上整合了超过 148 万个细胞。此外,还证明了 scmFormer 在生成未测量模态方面优于现有方法,非常适合空间多组学数据。因此,scmFormer 是分析单细胞多组学数据的强大而全面的工具。