Long James P, Yang Yumeng, Shimizu Shohei, Pham Thong, Do Kim-Anh
Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA.
Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA.
BMC Bioinformatics. 2025 Jan 7;26(1):4. doi: 10.1186/s12859-024-06027-7.
In cell line perturbation experiments, a collection of cells is perturbed with external agents and responses such as protein expression measured. Due to cost constraints, only a small fraction of all possible perturbations can be tested in vitro. This has led to the development of computational models that can predict cellular responses to perturbations in silico. A central challenge for these models is to predict the effect of new, previously untested perturbations that were not used in the training data. Here we propose causal structural equations for modeling how perturbations effect cells. From this model, we derive two estimators for predicting responses: a Linear Regression (LR) estimator and a causal structure learning estimator that we term Causal Structure Regression (CSR). The CSR estimator requires more assumptions than LR, but can predict the effects of drugs that were not applied in the training data. Next we present Cellbox, a recently proposed system of ordinary differential equations (ODEs) based model that obtained the best prediction performance on a Melanoma cell line perturbation data set (Yuan et al. in Cell Syst 12:128-140, 2021). We derive analytic results that show a close connection between CSR and Cellbox, providing a new causal interpretation for the Cellbox model. We compare LR and CSR/Cellbox in simulations, highlighting the strengths and weaknesses of the two approaches. Finally we compare the performance of LR and CSR/Cellbox on the benchmark Melanoma data set. We find that the LR model has comparable or slightly better performance than Cellbox.
在细胞系扰动实验中,一组细胞受到外部因素的扰动,并测量诸如蛋白质表达等反应。由于成本限制,体外只能测试所有可能扰动中的一小部分。这导致了能够在计算机上预测细胞对扰动反应的计算模型的发展。这些模型面临的一个核心挑战是预测训练数据中未使用的新的、以前未测试过的扰动的影响。在这里,我们提出了用于对扰动如何影响细胞进行建模的因果结构方程。从这个模型中,我们推导出两个用于预测反应的估计器:一个线性回归(LR)估计器和一个我们称为因果结构回归(CSR)的因果结构学习估计器。CSR估计器比LR需要更多的假设,但可以预测训练数据中未应用的药物的影响。接下来,我们介绍Cellbox,这是一个最近提出的基于常微分方程(ODE)的模型系统,它在黑色素瘤细胞系扰动数据集上获得了最佳预测性能(Yuan等人,《细胞系统》12:128 - 140,2021)。我们推导出分析结果,表明CSR和Cellbox之间存在密切联系,为Cellbox模型提供了一种新的因果解释。我们在模拟中比较了LR和CSR/Cellbox,突出了这两种方法的优缺点。最后,我们在基准黑色素瘤数据集上比较了LR和CSR/Cellbox的性能。我们发现LR模型的性能与Cellbox相当或略好。