Rao Jingyou, Wang Mingsen, Howard Matthew, Coyote-Maestas Willow, Pimentel Harold
Department of Computer Science, UCLA, Los Angeles, CA, USA.
Department of Mathematics, Baruch College, CUNY, New York City, NY, USA.
bioRxiv. 2025 Aug 4:2025.08.01.667517. doi: 10.1101/2025.08.01.667517.
Multi-phenotype deep mutational scanning (DMS) experiments provide a powerful means to dissect how protein variants affect different layers of molecular function, such as abundance, surface expression, and ligand binding. When these phenotypes are connected through a molecular pathway, interpreting variant effects becomes challenging because downstream phenotypes often reflect both direct and indirect consequences of mutation. We introduce Cosmos, a Bayesian framework for residue-level causal inference in multi-phenotype DMS data. Cosmos addresses three key questions: (1) whether a causal relationship exists between two phenotypes; (2) the strength of that relationship; and (3) the expected downstream phenotype if the upstream phenotype were normalized, enabling counterfactual interpretation. The framework uses position-level aggregation and Bayesian model selection to infer interpretable causal structures, without requiring phenotype-specific biophysical assumptions. We apply Cosmos to three datasets-Kir2.1 (abundance and surface expression), PSD95-PDZ3 (abundance and CRIPT binding), and KRAS (abundance and RAF1-RBD binding) and show that it effectively distinguishes direct from indirect functional effects. Across these applications, Cosmos provides a generalizable and interpretable approach to disentangle causal relationships in high-throughput protein functional screens.
多表型深度突变扫描(DMS)实验提供了一种强大的方法,用于剖析蛋白质变体如何影响分子功能的不同层面,如丰度、表面表达和配体结合。当这些表型通过分子途径相互关联时,解释变体效应就变得具有挑战性,因为下游表型往往反映了突变的直接和间接后果。我们引入了Cosmos,这是一个用于多表型DMS数据中残基水平因果推断的贝叶斯框架。Cosmos解决了三个关键问题:(1)两种表型之间是否存在因果关系;(2)这种关系的强度;(3)如果上游表型被归一化,预期的下游表型是什么,从而实现反事实解释。该框架使用位置水平聚合和贝叶斯模型选择来推断可解释的因果结构,而无需特定于表型的生物物理假设。我们将Cosmos应用于三个数据集——Kir2.1(丰度和表面表达)、PSD95-PDZ3(丰度和CRIPT结合)以及KRAS(丰度和RAF1-RBD结合),结果表明它能够有效地区分直接和间接功能效应。在这些应用中,Cosmos提供了一种可推广且可解释的方法,用于在高通量蛋白质功能筛选中理清因果关系。