Li Hechen, Zhang Ziqi, Squires Michael, Chen Xi, Zhang Xiuwei
Georgia Institute of Technology, Atlanta, GA, USA.
Southern University of Science and Technology, Shenzhen, China.
Nat Methods. 2025 May;22(5):982-993. doi: 10.1038/s41592-025-02651-0. Epub 2025 Apr 17.
Simulated single-cell data are essential for designing and evaluating computational methods in the absence of experimental ground truth. Here we present scMultiSim, a comprehensive simulator that generates multimodal single-cell data encompassing gene expression, chromatin accessibility, RNA velocity and spatial cell locations while accounting for the relationships between modalities. Unlike existing tools that focus on limited biological factors, scMultiSim simultaneously models cell identity, gene regulatory networks, cell-cell interactions and chromatin accessibility while incorporating technical noise. Moreover, it allows users to adjust each factor's effect easily. Here we show that scMultiSim generates data with expected biological effects, and demonstrate its applications by benchmarking a wide range of computational tasks, including multimodal and multi-batch data integration, RNA velocity estimation, gene regulatory network inference and cell-cell interaction inference using spatially resolved gene expression data. Compared to existing simulators, scMultiSim can benchmark a much broader range of existing computational problems and even new potential tasks.
在缺乏实验真实数据的情况下,模拟单细胞数据对于设计和评估计算方法至关重要。在此,我们展示了scMultiSim,这是一个全面的模拟器,它能生成多模态单细胞数据,涵盖基因表达、染色质可及性、RNA速度和空间细胞位置,同时考虑各模态之间的关系。与现有专注于有限生物学因素的工具不同,scMultiSim在纳入技术噪声的同时,能同时对细胞身份、基因调控网络、细胞间相互作用和染色质可及性进行建模。此外,它还允许用户轻松调整每个因素的影响。在此我们表明,scMultiSim生成的数据具有预期的生物学效应,并通过对广泛的计算任务进行基准测试来展示其应用,包括多模态和多批次数据整合、RNA速度估计、基因调控网络推断以及使用空间分辨基因表达数据进行细胞间相互作用推断。与现有模拟器相比,scMultiSim可以对更广泛的现有计算问题甚至新的潜在任务进行基准测试。