用于基因表达数据完全反卷积的几何结构引导模型及算法

GEOMETRIC STRUCTURE GUIDED MODEL AND ALGORITHMS FOR COMPLETE DECONVOLUTION OF GENE EXPRESSION DATA.

作者信息

Chen Duan, Li Shaoyu, Wang Xue

机构信息

Department of Mathematics and Statistics School of Data Science University of North Carolina at Charlotte, USA.

Department of Mathematics and Statistics University of North Carolina at Charlotte, USA.

出版信息

Found Data Sci. 2022 Sep;4(3):441-466. doi: 10.3934/fods.2022013.

DOI:10.3934/fods.2022013

PMID:38250319

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10798655/

Abstract

Complete deconvolution analysis for bulk RNA-seq data is important and helpful to distinguish whether the differences of disease-associated GEPs (gene expression profiles) in tissues of patients and normal controls are due to changes in cellular composition of tissue samples, or due to GEPs changes in specific cells. One of the major techniques to perform complete deconvolution is nonnegative matrix factorization (NMF), which also has a wide-range of applications in the machine learning community. However, the NMF is a well-known strongly ill-posed problem, so a direct application of NMF to RNA-seq data will suffer severe difficulties in the interpretability of solutions. In this paper, we develop an NMF-based mathematical model and corresponding computational algorithms to improve the solution identifiability of deconvoluting bulk RNA-seq data. In our approach, we combine the biological concept of marker genes with the solvability conditions of the NMF theories, and develop a geometric structures guided optimization model. In this strategy, the geometric structure of bulk tissue data is first explored by the spectral clustering technique. Then, the identified information of marker genes is integrated as solvability constraints, while the overall correlation graph is used as manifold regularization. Both synthetic and biological data are used to validate the proposed model and algorithms, from which solution interpretability and accuracy are significantly improved.

摘要

对批量RNA测序数据进行完整的反卷积分析非常重要且有助于区分患者组织和正常对照中疾病相关基因表达谱（GEP）的差异是由于组织样本细胞组成的变化，还是由于特定细胞中GEP的变化。执行完整反卷积的主要技术之一是非负矩阵分解（NMF），它在机器学习领域也有广泛应用。然而，NMF是一个众所周知的严重不适定问题，因此将NMF直接应用于RNA测序数据在解的可解释性方面会遇到严重困难。在本文中，我们开发了一种基于NMF的数学模型和相应的计算算法，以提高对批量RNA测序数据进行反卷积时解的可识别性。在我们的方法中，我们将标记基因的生物学概念与NMF理论的可解性条件相结合，开发了一种几何结构引导的优化模型。在该策略中，首先通过谱聚类技术探索批量组织数据的几何结构。然后，将识别出的标记基因信息整合为可解性约束，而整体相关图用作流形正则化。合成数据和生物学数据均用于验证所提出的模型和算法，由此解的可解释性和准确性得到显著提高。

相似文献

GEOMETRIC STRUCTURE GUIDED MODEL AND ALGORITHMS FOR COMPLETE DECONVOLUTION OF GENE EXPRESSION DATA.

Found Data Sci. 2022 Sep;4(3):441-466. doi: 10.3934/fods.2022013.

Convex nonnegative matrix factorization with manifold regularization.

Neural Netw. 2015 Mar;63:94-103. doi: 10.1016/j.neunet.2014.11.007. Epub 2014 Dec 4.

Manifold Peaks Nonnegative Matrix Factorization.

IEEE Trans Neural Netw Learn Syst. 2024 May;35(5):6850-6862. doi: 10.1109/TNNLS.2022.3212922. Epub 2024 May 2.

Semi-supervised Nonnegative Matrix Factorization for gene expression deconvolution: a case study.

Infect Genet Evol. 2012 Jul;12(5):913-21. doi: 10.1016/j.meegid.2011.08.014. Epub 2011 Sep 10.

Hessian regularization based symmetric nonnegative matrix factorization for clustering gene expression and microbiome data.

Methods. 2016 Dec 1;111:80-84. doi: 10.1016/j.ymeth.2016.06.017. Epub 2016 Jun 20.

A Robust Manifold Graph Regularized Nonnegative Matrix Factorization Algorithm for Cancer Gene Clustering.

Molecules. 2017 Dec 2;22(12):2131. doi: 10.3390/molecules22122131.

Dual-Graph Global and Local Concept Factorization for Data Clustering.

IEEE Trans Neural Netw Learn Syst. 2022 Jun 2;PP. doi: 10.1109/TNNLS.2022.3177433.

Nonnegative Matrix Factorization with Rank Regularization and Hard Constraint.

Neural Comput. 2017 Sep;29(9):2553-2579. doi: 10.1162/neco_a_00995. Epub 2017 Aug 4.

A robust semi-supervised NMF model for single cell RNA-seq data.

PeerJ. 2020 Oct 16;8:e10091. doi: 10.7717/peerj.10091. eCollection 2020.

Hessian regularization based non-negative matrix factorization for gene expression data clustering.

Annu Int Conf IEEE Eng Med Biol Soc. 2015;2015:4130-3. doi: 10.1109/EMBC.2015.7319303.

引用本文的文献

Robustness and resilience of computational deconvolution methods for bulk RNA sequencing data.

Brief Bioinform. 2025 May 1;26(3). doi: 10.1093/bib/bbaf264.

Analyzing Single Cell RNA Sequencing with Topological Nonnegative Matrix Factorization.

J Comput Appl Math. 2024 Aug 1;445. doi: 10.1016/j.cam.2024.115842. Epub 2024 Feb 19.

A hybrid stochastic interpolation and compression method for kernel matrices.

J Comput Phys. 2023 Dec 1;494. doi: 10.1016/j.jcp.2023.112491. Epub 2023 Sep 12.

本文引用的文献

Inferring spatial and signaling relationships between cells from single cell transcriptomic data.

Nat Commun. 2020 Apr 29;11(1):2084. doi: 10.1038/s41467-020-15968-5.

scAI: an unsupervised approach for the integrative analysis of parallel single-cell transcriptomic and epigenomic profiles.

Genome Biol. 2020 Feb 3;21(1):25. doi: 10.1186/s13059-020-1932-8.

Revealing Dynamic Mechanisms of Cell Fate Decisions From Single-Cell Transcriptomic Data.

Front Genet. 2019 Dec 23;10:1280. doi: 10.3389/fgene.2019.01280. eCollection 2019.

CDSeq: A novel complete deconvolution method for dissecting heterogeneous samples using gene expression data.

PLoS Comput Biol. 2019 Dec 2;15(12):e1007510. doi: 10.1371/journal.pcbi.1007510. eCollection 2019 Dec.

Accurate estimation of cell-type composition from gene expression data.

Nat Commun. 2019 Jul 5;10(1):2975. doi: 10.1038/s41467-019-10802-z.

Complete deconvolution of cellular mixtures based on linearity of transcriptional signatures.

Nat Commun. 2019 May 17;10(1):2209. doi: 10.1038/s41467-019-09990-5.

Hyperspectral Image Unmixing With Endmember Bundles and Group Sparsity Inducing Mixed Norms.

IEEE Trans Image Process. 2019 Jul;28(7):3435-3450. doi: 10.1109/TIP.2019.2897254. Epub 2019 Feb 4.

A multi-omic atlas of the human frontal cortex for aging and Alzheimer's disease research.

Sci Data. 2018 Aug 7;5:180142. doi: 10.1038/sdata.2018.142.

A molecular network of the aging human brain provides insights into the pathology and cognitive decline of Alzheimer's disease.

Nat Neurosci. 2018 Jun;21(6):811-819. doi: 10.1038/s41593-018-0154-9. Epub 2018 May 25.

Computational deconvolution of transcriptomics data from mixed cell populations.

Bioinformatics. 2018 Jun 1;34(11):1969-1979. doi: 10.1093/bioinformatics/bty019.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

用于基因表达数据完全反卷积的几何结构引导模型及算法

GEOMETRIC STRUCTURE GUIDED MODEL AND ALGORITHMS FOR COMPLETE DECONVOLUTION OF GENE EXPRESSION DATA.

作者信息

Chen Duan, Li Shaoyu, Wang Xue

机构信息

Department of Mathematics and Statistics School of Data Science University of North Carolina at Charlotte, USA.

Department of Mathematics and Statistics University of North Carolina at Charlotte, USA.

出版信息

Found Data Sci. 2022 Sep;4(3):441-466. doi: 10.3934/fods.2022013.

DOI:10.3934/fods.2022013

PMID:38250319

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10798655/

Abstract

摘要

用于基因表达数据完全反卷积的几何结构引导模型及算法

GEOMETRIC STRUCTURE GUIDED MODEL AND ALGORITHMS FOR COMPLETE DECONVOLUTION OF GENE EXPRESSION DATA.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

用于基因表达数据完全反卷积的几何结构引导模型及算法

GEOMETRIC STRUCTURE GUIDED MODEL AND ALGORITHMS FOR COMPLETE DECONVOLUTION OF GENE EXPRESSION DATA.

作者信息

机构信息

出版信息