Suppr超能文献

VarNMF:带源变化的非负概率分解

VarNMF: non-negative probabilistic factorization with source variation.

作者信息

Fallik Ela, Friedman Nir

机构信息

School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, 9190401, Israel.

Lautenberg Center for Immunology and Cancer Research, Faculty of Medicine, The Hebrew University of Jerusalem, Jerusalem, 9112102, Israel.

出版信息

Bioinformatics. 2024 Dec 26;41(1). doi: 10.1093/bioinformatics/btae758.

Abstract

MOTIVATION

Non-negative matrix factorization (NMF) is a powerful tool often applied to genomic data to identify non-negative latent components that constitute linearly mixed samples. It is useful when the observed signal combines contributions from multiple sources, such as cell types in bulk measurements of heterogeneous tissue. NMF accounts for two types of variation between samples - disparities in the proportions of sources and observation noise. However, in many settings, there is also a non-trivial variation between samples in the contribution of each source to the mixed data. This variation cannot be accurately modeled using the NMF framework.

RESULTS

We present VarNMF, a probabilistic extension of NMF that explicitly models this variation in source values. We show that by modeling sources as non-negative distributions, we can recover source variation directly from mixed samples without observing any of the sources directly. We apply VarNMF to a cell-free ChIP-seq dataset of two cancer cohorts and a healthy cohort, demonstrating that VarNMF provides a better estimation of the data distribution. Moreover, VarNMF extracts cancer-associated source distributions that decouple the tumor characteristics from the amount of tumor contribution, and identify patient-specific disease behaviors. This decomposition highlights the inter-tumor variability that is obscured in the mixed samples.

AVAILABILITY AND IMPLEMENTATION

Code is available at https://github.com/Nir-Friedman-Lab/VarNMF.

摘要

动机

非负矩阵分解(NMF)是一种强大的工具,常用于基因组数据,以识别构成线性混合样本的非负潜在成分。当观察到的信号结合了多个来源的贡献时,它很有用,例如在异质组织的批量测量中的细胞类型。NMF考虑了样本之间的两种变异类型——来源比例的差异和观测噪声。然而,在许多情况下,每个来源对混合数据的贡献在样本之间也存在显著变异。这种变异无法使用NMF框架进行准确建模。

结果

我们提出了VarNMF,这是NMF的一种概率扩展,它明确地对源值中的这种变异进行建模。我们表明,通过将来源建模为非负分布,我们可以直接从混合样本中恢复源变异,而无需直接观察任何一个来源。我们将VarNMF应用于两个癌症队列和一个健康队列的无细胞ChIP-seq数据集,证明VarNMF能更好地估计数据分布。此外,VarNMF提取出与癌症相关的源分布,这些分布将肿瘤特征与肿瘤贡献量解耦,并识别出患者特异性的疾病行为。这种分解突出了混合样本中被掩盖的肿瘤间变异性。

可用性和实现

代码可在https://github.com/Nir-Friedman-Lab/VarNMF获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/98db/11979754/34bcc3ff5b09/btae758f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验