Garbulowski Mateusz, Hillerton Thomas, Morgan Daniel, Seçilmiş Deniz, Sonnhammer Lisbet, Tjärnberg Andreas, Nordling Torbjörn E M, Sonnhammer Erik L L
Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Box 1031, Solna 171 21, Sweden.
Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala 751 85, Sweden.
NAR Genom Bioinform. 2024 Sep 18;6(3):lqae121. doi: 10.1093/nargab/lqae121. eCollection 2024 Sep.
Single-cell data is increasingly used for gene regulatory network (GRN) inference, and benchmarks for this have been developed based on simulated data. However, existing single-cell simulators cannot model the effects of gene perturbations. A further challenge lies in generating large-scale GRNs that often struggle with computational and stability issues. We present GeneSPIDER2, an update of the GeneSPIDER MATLAB toolbox for GRN benchmarking, inference, and analysis. Several software modules have improved capabilities and performance, and new functionalities have been added. A major improvement is the ability to generate large GRNs with biologically realistic topological properties in terms of scale-free degree distribution and modularity. Another major addition is a simulation of single-cell data, which is becoming increasingly popular as input for GRN inference. Specifically, we introduced the unique feature to generate single-cell data based on genetic perturbations. Finally, the simulated single-cell data was compared to real single-cell Perturb-seq data from two cell lines, showing that the synthetic and real data exhibit similar properties.
单细胞数据越来越多地用于基因调控网络(GRN)推断,并且基于模拟数据为此开发了基准。然而,现有的单细胞模拟器无法模拟基因扰动的影响。另一个挑战在于生成大规模的基因调控网络,这往往面临计算和稳定性问题。我们展示了GeneSPIDER2,它是用于基因调控网络基准测试、推断和分析的GeneSPIDER MATLAB工具箱的更新版本。几个软件模块的功能和性能得到了改进,并添加了新功能。一个主要改进是能够生成具有无标度度分布和模块化等生物学现实拓扑特性的大型基因调控网络。另一个主要新增功能是单细胞数据模拟,作为基因调控网络推断的输入,它越来越受欢迎。具体来说,我们引入了基于基因扰动生成单细胞数据的独特功能。最后,将模拟的单细胞数据与来自两个细胞系的真实单细胞Perturb-seq数据进行比较,结果表明合成数据和真实数据具有相似的特性。