Khoury College of Computer Sciences, Northeastern University, Boston, MA 02115, USA.
Computational Biology, Pacific Northwest National Laboratory, Richland, Washington, DC 99354, USA.
Bioinformatics. 2022 Jun 24;38(Suppl 1):i350-i358. doi: 10.1093/bioinformatics/btac251.
Estimating causal queries, such as changes in protein abundance in response to a perturbation, is a fundamental task in the analysis of biomolecular pathways. The estimation requires experimental measurements on the pathway components. However, in practice many pathway components are left unobserved (latent) because they are either unknown, or difficult to measure. Latent variable models (LVMs) are well-suited for such estimation. Unfortunately, LVM-based estimation of causal queries can be inaccurate when parameters of the latent variables are not uniquely identified, or when the number of latent variables is misspecified. This has limited the use of LVMs for causal inference in biomolecular pathways.
In this article, we propose a general and practical approach for LVM-based estimation of causal queries. We prove that, despite the challenges above, LVM-based estimators of causal queries are accurate if the queries are identifiable according to Pearl's do-calculus and describe an algorithm for its estimation. We illustrate the breadth and the practical utility of this approach for estimating causal queries in four synthetic and two experimental case studies, where structures of biomolecular pathways challenge the existing methods for causal query estimation.
The code and the data documenting all the case studies are available at https://github.com/srtaheri/LVMwithDoCalculus.
Supplementary data are available at Bioinformatics online.
估计因果查询,例如响应于扰动的蛋白质丰度的变化,是生物分子途径分析中的基本任务。该估计需要对途径成分进行实验测量。然而,在实践中,由于许多途径成分是未知的或难以测量的,因此许多途径成分是未被观察到的(潜在的)。潜变量模型(LVM)非常适合此类估计。不幸的是,当潜在变量的参数未被唯一识别或当潜在变量的数量被错误指定时,基于 LVM 的因果查询估计可能不准确。这限制了 LVM 在生物分子途径中的因果推断中的使用。
在本文中,我们提出了一种基于 LVM 的因果查询估计的通用实用方法。我们证明,尽管存在上述挑战,但如果根据 Pearl 的 do 演算可识别因果查询,基于 LVM 的因果查询估计器是准确的,并描述了一种用于其估计的算法。我们通过四个合成案例研究和两个实验案例研究说明了这种方法在估计因果查询方面的广度和实际效用,其中生物分子途径的结构挑战了现有的因果查询估计方法。
记录所有案例研究的代码和数据可在 https://github.com/srtaheri/LVMwithDoCalculus 上获得。
补充数据可在生物信息学在线获得。