• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于批量RNA测序数据完全反卷积的增强型广义非负矩阵分解模型。

An augmented GSNMF model for complete deconvolution of bulk RNA-seq data.

作者信息

Li Yujie, Xu Su, Wang Xue, Ertekin-Taner Nilüfer, Chen Duan

机构信息

Department of Mathematics and Statistics, University of North Carolina at Charlotte, USA.

School of Data Science, University of North Carolina at Charlotte, USA.

出版信息

Math Biosci Eng. 2025 Mar 14;22(4):988-1018. doi: 10.3934/mbe.2025036.

DOI:10.3934/mbe.2025036
PMID:40296800
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12043048/
Abstract

Performing complete deconvolution analysis for bulk RNA-seq data to obtain both cell type specific gene expression profiles (GEP) and relative cell abundances is a challenging task. One of the fundamental models used, the nonnegative matrix factorization (NMF), is mathematically ill-posed. Although several complete deconvolution methods have been developed, and their estimates compared to ground truth for some datasets appear promising, a comprehensive understanding of how to circumvent the ill-posedness and improve solution accuracy is lacking. In this paper, we first investigated the necessary requirements for a given dataset to satisfy the solvability conditions in NMF theory. Even with solvability conditions, the "unique" solutions of NMF are subject to a rescaling matrix. Therefore, we provide estimates of the converged local minima and the possible rescaling matrix, based on informative initial conditions. Using these strategies, we developed a new pipeline of pseudo-bulk tissue data augmented, geometric structure guided NMF model (GSNMF+). In our approach, pseudo-bulk tissue data was generated, by statistical distribution simulated pseudo cellular compositions and single-cell RNA-seq (scRNA-seq) data, and then mixed with the original dataset. The constituent matrices of the hybrid dataset then satisfy the weak solvability conditions of NMF. Furthermore, an estimated rescaling matrix was used to adjust the minimizer of the NMF, which was expected to reduce mean square root errors of solutions. Our algorithms are tested on several realistic bulk-tissue datasets and showed significant improvements in scenarios with singular cellular compositions.

摘要

对批量RNA测序数据进行完整的去卷积分析,以获得细胞类型特异性基因表达谱(GEP)和相对细胞丰度,是一项具有挑战性的任务。所使用的基本模型之一,非负矩阵分解(NMF),在数学上是不适定的。尽管已经开发了几种完整的去卷积方法,并且它们对某些数据集的估计与真实情况相比看起来很有前景,但对于如何规避不适定性并提高解的准确性仍缺乏全面的理解。在本文中,我们首先研究了给定数据集满足NMF理论中可解性条件的必要要求。即使满足可解性条件,NMF的“唯一”解也受缩放矩阵的影响。因此,我们基于信息丰富的初始条件,提供了收敛局部最小值和可能的缩放矩阵的估计。使用这些策略,我们开发了一种新的伪批量组织数据增强、几何结构引导的NMF模型(GSNMF+)的流程。在我们的方法中,通过统计分布模拟伪细胞组成和单细胞RNA测序(scRNA-seq)数据生成伪批量组织数据,然后与原始数据集混合。然后,混合数据集的组成矩阵满足NMF的弱可解性条件。此外,使用估计的缩放矩阵来调整NMF的极小值,这有望降低解的均方根误差。我们的算法在几个实际的批量组织数据集上进行了测试,并在具有奇异细胞组成的场景中显示出显著的改进。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/30ae/12043048/11e676deda88/nihms-1998670-f0009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/30ae/12043048/9e78d729f143/nihms-1998670-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/30ae/12043048/271bfa6dfa21/nihms-1998670-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/30ae/12043048/56eb89ddcb96/nihms-1998670-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/30ae/12043048/20e071b0ac7e/nihms-1998670-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/30ae/12043048/9d17ec2caea4/nihms-1998670-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/30ae/12043048/576039953911/nihms-1998670-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/30ae/12043048/cc5569dd1453/nihms-1998670-f0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/30ae/12043048/55c3830a6179/nihms-1998670-f0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/30ae/12043048/11e676deda88/nihms-1998670-f0009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/30ae/12043048/9e78d729f143/nihms-1998670-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/30ae/12043048/271bfa6dfa21/nihms-1998670-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/30ae/12043048/56eb89ddcb96/nihms-1998670-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/30ae/12043048/20e071b0ac7e/nihms-1998670-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/30ae/12043048/9d17ec2caea4/nihms-1998670-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/30ae/12043048/576039953911/nihms-1998670-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/30ae/12043048/cc5569dd1453/nihms-1998670-f0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/30ae/12043048/55c3830a6179/nihms-1998670-f0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/30ae/12043048/11e676deda88/nihms-1998670-f0009.jpg

相似文献

1
An augmented GSNMF model for complete deconvolution of bulk RNA-seq data.用于批量RNA测序数据完全反卷积的增强型广义非负矩阵分解模型。
Math Biosci Eng. 2025 Mar 14;22(4):988-1018. doi: 10.3934/mbe.2025036.
2
GEOMETRIC STRUCTURE GUIDED MODEL AND ALGORITHMS FOR COMPLETE DECONVOLUTION OF GENE EXPRESSION DATA.用于基因表达数据完全反卷积的几何结构引导模型及算法
Found Data Sci. 2022 Sep;4(3):441-466. doi: 10.3934/fods.2022013.
3
Deconvolution from bulk gene expression by leveraging sample-wise and gene-wise similarities and single-cell RNA-Seq data.通过利用样本和基因之间的相似性以及单细胞 RNA-Seq 数据进行批量基因表达的反卷积。
BMC Genomics. 2024 Sep 18;25(1):875. doi: 10.1186/s12864-024-10728-x.
4
Assessing transcriptomic heterogeneity of single-cell RNASeq data by bulk-level gene expression data.通过批量水平基因表达数据评估单细胞RNA测序数据的转录组异质性。
BMC Bioinformatics. 2024 Jun 12;25(1):209. doi: 10.1186/s12859-024-05825-3.
5
Semi-supervised Nonnegative Matrix Factorization for gene expression deconvolution: a case study.半监督非负矩阵分解在基因表达解卷积中的应用:案例研究。
Infect Genet Evol. 2012 Jul;12(5):913-21. doi: 10.1016/j.meegid.2011.08.014. Epub 2011 Sep 10.
6
Transcriptome size matters for single-cell RNA-seq normalization and bulk deconvolution.转录组大小对单细胞RNA测序标准化和批量反卷积很重要。
Nat Commun. 2025 Feb 1;16(1):1246. doi: 10.1038/s41467-025-56623-1.
7
SimBu: bias-aware simulation of bulk RNA-seq data with variable cell-type composition.SimBu:具有可变细胞类型组成的批量 RNA-seq 数据的偏差感知模拟。
Bioinformatics. 2022 Sep 16;38(Suppl_2):ii141-ii147. doi: 10.1093/bioinformatics/btac499.
8
MuSiC2: cell-type deconvolution for multi-condition bulk RNA-seq data.MuSiC2:用于多条件批量 RNA-seq 数据的细胞类型去卷积。
Brief Bioinform. 2022 Nov 19;23(6). doi: 10.1093/bib/bbac430.
9
NNICE: a deep quantile neural network algorithm for expression deconvolution.NNICE:一种用于表达解卷积的深度分位数神经网络算法。
Sci Rep. 2024 Jun 18;14(1):14040. doi: 10.1038/s41598-024-65053-w.
10
Effective methods for bulk RNA-seq deconvolution using scnRNA-seq transcriptomes.使用 scnRNA-seq 转录组进行批量 RNA-seq 去卷积的有效方法。
Genome Biol. 2023 Aug 1;24(1):177. doi: 10.1186/s13059-023-03016-6.

本文引用的文献

1
Inferring spatial and signaling relationships between cells from single cell transcriptomic data.从单细胞转录组数据推断细胞之间的空间和信号关系。
Nat Commun. 2020 Apr 29;11(1):2084. doi: 10.1038/s41467-020-15968-5.
2
scAI: an unsupervised approach for the integrative analysis of parallel single-cell transcriptomic and epigenomic profiles.scAI:一种用于平行单细胞转录组学和表观基因组学综合分析的无监督方法。
Genome Biol. 2020 Feb 3;21(1):25. doi: 10.1186/s13059-020-1932-8.
3
Revealing Dynamic Mechanisms of Cell Fate Decisions From Single-Cell Transcriptomic Data.
从单细胞转录组数据揭示细胞命运决定的动态机制。
Front Genet. 2019 Dec 23;10:1280. doi: 10.3389/fgene.2019.01280. eCollection 2019.
4
CDSeq: A novel complete deconvolution method for dissecting heterogeneous samples using gene expression data.CDSeq:一种使用基因表达数据对异质样本进行全面剖析的全新去卷积方法。
PLoS Comput Biol. 2019 Dec 2;15(12):e1007510. doi: 10.1371/journal.pcbi.1007510. eCollection 2019 Dec.
5
Accurate estimation of cell-type composition from gene expression data.从基因表达数据中准确估计细胞类型组成。
Nat Commun. 2019 Jul 5;10(1):2975. doi: 10.1038/s41467-019-10802-z.
6
Complete deconvolution of cellular mixtures based on linearity of transcriptional signatures.基于转录特征的线性关系实现细胞混合物的完全去卷积。
Nat Commun. 2019 May 17;10(1):2209. doi: 10.1038/s41467-019-09990-5.
7
A multi-omic atlas of the human frontal cortex for aging and Alzheimer's disease research.人类前额叶皮层衰老和阿尔茨海默病研究的多组学图谱
Sci Data. 2018 Aug 7;5:180142. doi: 10.1038/sdata.2018.142.
8
A molecular network of the aging human brain provides insights into the pathology and cognitive decline of Alzheimer's disease.衰老人类大脑的分子网络为阿尔茨海默病的病理学和认知衰退提供了见解。
Nat Neurosci. 2018 Jun;21(6):811-819. doi: 10.1038/s41593-018-0154-9. Epub 2018 May 25.
9
Computational deconvolution of transcriptomics data from mixed cell populations.计算从混合细胞群体中转录组数据的去卷积。
Bioinformatics. 2018 Jun 1;34(11):1969-1979. doi: 10.1093/bioinformatics/bty019.
10
Integrative single-cell analysis of transcriptional and epigenetic states in the human adult brain.人类成年大脑中转录和表观遗传状态的综合单细胞分析。
Nat Biotechnol. 2018 Jan;36(1):70-80. doi: 10.1038/nbt.4038. Epub 2017 Dec 11.