Bing Xin, Lovelace Tyler, Bunea Florentina, Wegkamp Marten, Kasturi Sudhir Pai, Singh Harinder, Benos Panayiotis V, Das Jishnu
Department of Statistics and Data Science, Cornell University, Ithaca, NY, USA.
Department of Computational & Systems Biology, University of Pittsburgh, Pittsburgh, PA, USA.
Patterns (N Y). 2022 Mar 24;3(5):100473. doi: 10.1016/j.patter.2022.100473. eCollection 2022 May 13.
High-dimensional cellular and molecular profiling of biological samples highlights the need for analytical approaches that can integrate multi-omic datasets to generate prioritized causal inferences. Current methods are limited by high dimensionality of the combined datasets, the differences in their data distributions, and their integration to infer causal relationships. Here, we present Essential Regression (ER), a novel latent-factor-regression-based interpretable machine-learning approach that addresses these problems by identifying latent factors and their likely cause-effect relationships with system-wide outcomes/properties of interest. ER can integrate many multi-omic datasets without structural or distributional assumptions regarding the data. It outperforms a range of state-of-the-art methods in terms of prediction. ER can be coupled with probabilistic graphical modeling, thereby strengthening the causal inferences. The utility of ER is demonstrated using multi-omic system immunology datasets to generate and validate novel cellular and molecular inferences in a wide range of contexts including immunosenescence and immune dysregulation.
生物样本的高维细胞和分子分析凸显了对能够整合多组学数据集以生成优先因果推断的分析方法的需求。当前方法受到组合数据集的高维度、数据分布差异以及用于推断因果关系的整合的限制。在此,我们提出了基本回归(ER),这是一种基于潜在因素回归的新型可解释机器学习方法,它通过识别潜在因素及其与感兴趣的全系统结果/属性可能的因果关系来解决这些问题。ER可以整合许多多组学数据集,而无需对数据进行结构或分布假设。在预测方面,它优于一系列最先进的方法。ER可以与概率图形建模相结合,从而加强因果推断。使用多组学系统免疫学数据集在包括免疫衰老和免疫失调在内的广泛背景下生成并验证新的细胞和分子推断,证明了ER的实用性。