Gavriilidis George I, Vasileiou Vasileios, Orfanou Aspasia, Ishaque Naveed, Psomopoulos Fotis
Institute of Applied Biosciences, Centre for Research and Technology Hellas, Thessaloniki, Greece.
Department of Molecular Biology and Genetics, Democritus University of Thrace, Alexandroupolis, Greece.
Comput Struct Biotechnol J. 2024 Apr 25;23:1886-1896. doi: 10.1016/j.csbj.2024.04.058. eCollection 2024 Dec.
Recent advances in single-cell omics technology have transformed the landscape of cellular and molecular research, enriching the scope and intricacy of cellular characterisation. Perturbation modelling seeks to comprehensively grasp the effects of external influences like disease onset or molecular knock-outs or external stimulants on cellular physiology, specifically on transcription factors, signal transducers, biological pathways, and dynamic cell states. Machine and deep learning tools transform complex perturbational phenomena in algorithmically tractable tasks to formulate predictions based on various types of single-cell datasets. However, the recent surge in tools and datasets makes it challenging for experimental biologists and computational scientists to keep track of the recent advances in this rapidly expanding filed of single-cell modelling. Here, we recapitulate the main objectives of perturbation modelling and summarise novel single-cell perturbation technologies based on genetic manipulation like CRISPR or compounds, spanning across omic modalities. We then concisely review a burgeoning group of computational methods extending from classical statistical inference methodologies to various machine and deep learning architectures like shallow models or autoencoders, to biologically informed approaches based on gene regulatory networks, and to combinatorial efforts reminiscent of ensemble learning. We also discuss the rising trend of large foundational models in single-cell perturbation modelling inspired by large language models. Lastly, we critically assess the challenges that underline single-cell perturbation modelling while pointing towards relevant future perspectives like perturbation atlases, multi-omics and spatial datasets, causal machine learning for interpretability, multi-task learning for performance and explainability as well as prospects for solving interoperability and benchmarking pitfalls.
单细胞组学技术的最新进展改变了细胞和分子研究的格局,丰富了细胞表征的范围和复杂性。扰动建模旨在全面掌握诸如疾病发作、分子敲除或外部刺激等外部影响对细胞生理学的作用,特别是对转录因子、信号转导器、生物途径和动态细胞状态的影响。机器学习和深度学习工具将复杂的扰动现象转化为算法上易于处理的任务,以便基于各种类型的单细胞数据集进行预测。然而,工具和数据集的近期激增使得实验生物学家和计算科学家难以追踪这个快速发展的单细胞建模领域的最新进展。在这里,我们概括了扰动建模的主要目标,并总结了基于CRISPR等基因操作或化合物的新型单细胞扰动技术,涵盖了各种组学模式。然后,我们简要回顾了一组新兴的计算方法,从经典统计推断方法扩展到各种机器学习和深度学习架构,如浅层模型或自动编码器,再到基于基因调控网络的生物学信息方法,以及类似于集成学习的组合方法。我们还讨论了受大语言模型启发的单细胞扰动建模中大型基础模型的兴起趋势。最后,我们批判性地评估了单细胞扰动建模所面临的挑战,同时指出了相关的未来前景,如扰动图谱、多组学和空间数据集、用于可解释性的因果机器学习、用于性能和可解释性的多任务学习,以及解决互操作性和基准测试陷阱的前景。