基于判别式非负矩阵分解的RNA测序数据基因排序

Gene Ranking of RNA-Seq Data via Discriminant Non-Negative Matrix Factorization.

作者信息

Jia Zhilong, Zhang Xiang, Guan Naiyang, Bo Xiaochen, Barnes Michael R, Luo Zhigang

机构信息

Department of Chemistry and Biology, College of Science, National University of Defense Technology, Changsha, Hunan, P.R. China; William Harvey Research Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London, United Kingdom.

Science and Technology on Parallel and Distributed Processing Laboratory, College of Computer, National University of Defense Technology, Changsha, Hunan, P.R. China.

出版信息

PLoS One. 2015 Sep 8;10(9):e0137782. doi: 10.1371/journal.pone.0137782. eCollection 2015.

DOI:10.1371/journal.pone.0137782

PMID:26348772

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4562600/

Abstract

RNA-sequencing is rapidly becoming the method of choice for studying the full complexity of transcriptomes, however with increasing dimensionality, accurate gene ranking is becoming increasingly challenging. This paper proposes an accurate and sensitive gene ranking method that implements discriminant non-negative matrix factorization (DNMF) for RNA-seq data. To the best of our knowledge, this is the first work to explore the utility of DNMF for gene ranking. When incorporating Fisher's discriminant criteria and setting the reduced dimension as two, DNMF learns two factors to approximate the original gene expression data, abstracting the up-regulated or down-regulated metagene by using the sample label information. The first factor denotes all the genes' weights of two metagenes as the additive combination of all genes, while the second learned factor represents the expression values of two metagenes. In the gene ranking stage, all the genes are ranked as a descending sequence according to the differential values of the metagene weights. Leveraging the nature of NMF and Fisher's criterion, DNMF can robustly boost the gene ranking performance. The Area Under the Curve analysis of differential expression analysis on two benchmarking tests of four RNA-seq data sets with similar phenotypes showed that our proposed DNMF-based gene ranking method outperforms other widely used methods. Moreover, the Gene Set Enrichment Analysis also showed DNMF outweighs others. DNMF is also computationally efficient, substantially outperforming all other benchmarked methods. Consequently, we suggest DNMF is an effective method for the analysis of differential gene expression and gene ranking for RNA-seq data.

摘要

RNA测序正迅速成为研究转录组完整复杂性的首选方法，然而随着维度的增加，准确的基因排名变得越来越具有挑战性。本文提出了一种准确且灵敏的基因排名方法，该方法对RNA测序数据实施判别非负矩阵分解（DNMF）。据我们所知，这是探索DNMF用于基因排名效用的第一项工作。当纳入费舍尔判别准则并将降维设置为二维时，DNMF学习两个因子来近似原始基因表达数据，通过使用样本标签信息提取上调或下调的元基因。第一个因子表示两个元基因的所有基因权重，作为所有基因的加性组合，而第二个学习到的因子代表两个元基因的表达值。在基因排名阶段，所有基因根据元基因权重的差异值按降序排列。利用非负矩阵分解的性质和费舍尔准则，DNMF能够稳健地提升基因排名性能。对具有相似表型的四个RNA测序数据集的两个基准测试进行差异表达分析的曲线下面积分析表明，我们提出的基于DNMF的基因排名方法优于其他广泛使用的方法。此外，基因集富集分析也表明DNMF优于其他方法。DNMF在计算上也很高效，大大优于所有其他基准方法。因此，我们认为DNMF是一种用于分析RNA测序数据中差异基因表达和基因排名的有效方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e65e/4562600/f22a96f9a1c9/pone.0137782.g001.jpg

相似文献

Gene Ranking of RNA-Seq Data via Discriminant Non-Negative Matrix Factorization.基于判别式非负矩阵分解的RNA测序数据基因排序

PLoS One. 2015 Sep 8;10(9):e0137782. doi: 10.1371/journal.pone.0137782. eCollection 2015.

A new discriminant NMF algorithm and its application to the extraction of subtle emotional differences in speech.一种新的判别式 NMF 算法及其在语音细微情感差异提取中的应用。

Cogn Neurodyn. 2012 Dec;6(6):525-35. doi: 10.1007/s11571-012-9213-1. Epub 2012 Jul 21.

Benchmarking differential expression analysis tools for RNA-Seq: normalization-based vs. log-ratio transformation-based methods.RNA-Seq 差异表达分析工具的基准测试：基于标准化与基于对数比变换的方法。

BMC Bioinformatics. 2018 Jul 18;19(1):274. doi: 10.1186/s12859-018-2261-8.

Next-generation sequencing facilitates quantitative analysis of wild-type and Nrl(-/-) retinal transcriptomes.新一代测序技术有助于对野生型和Nrl基因敲除小鼠视网膜转录组进行定量分析。

Mol Vis. 2011;17:3034-54. Epub 2011 Nov 23.

SSNMDI: a novel joint learning model of semi-supervised non-negative matrix factorization and data imputation for clustering of single-cell RNA-seq data.SSNMDI：一种用于单细胞 RNA-seq 数据聚类的半监督非负矩阵分解和数据插补的新型联合学习模型。

Brief Bioinform. 2023 May 19;24(3). doi: 10.1093/bib/bbad149.

NBLDA: negative binomial linear discriminant analysis for RNA-Seq data.NBLDA：用于RNA测序数据的负二项式线性判别分析。

BMC Bioinformatics. 2016 Sep 13;17(1):369. doi: 10.1186/s12859-016-1208-1.

Effective detection of variation in single-cell transcriptomes using MATQ-seq.使用 MATQ-seq 有效检测单细胞转录组中的变异。

Nat Methods. 2017 Mar;14(3):267-270. doi: 10.1038/nmeth.4145. Epub 2017 Jan 16.

Differential expression analysis on RNA-Seq count data based on penalized matrix decomposition.基于惩罚矩阵分解的 RNA-Seq 计数数据差异表达分析。

IEEE Trans Nanobioscience. 2014 Mar;13(1):12-8. doi: 10.1109/TNB.2013.2296978.

Differential Expression Analysis of RNA-seq Reads: Overview, Taxonomy, and Tools.RNA-seq reads 的差异表达分析：概述、分类和工具。

IEEE/ACM Trans Comput Biol Bioinform. 2020 Mar-Apr;17(2):566-586. doi: 10.1109/TCBB.2018.2873010. Epub 2018 Oct 1.

Full-length isoform concatenation sequencing to resolve cancer transcriptome complexity.全长异构体拼接测序解析癌症转录组复杂性。

BMC Genomics. 2024 Jan 29;25(1):122. doi: 10.1186/s12864-024-10021-x.

引用本文的文献

ICARus: a pipeline to extract robust gene expression signatures from transcriptome datasets.ICARus：一种从转录组数据集中提取稳健基因表达特征的流程。

Front Bioinform. 2025 Jun 19;5:1604418. doi: 10.3389/fbinf.2025.1604418. eCollection 2025.

Effects of Chronic Inflammatory Activation of Murine and Human Arterial Endothelial Cells at Normal Lipoprotein and Cholesterol Levels and .在正常脂蛋白和胆固醇水平下，慢性炎症激活的鼠和人动脉内皮细胞的作用。

Cells. 2024 Apr 30;13(9):773. doi: 10.3390/cells13090773.

Antagonistic Functions of Androgen Receptor and NF-κB in Prostate Cancer-Experimental and Computational Analyses.雄激素受体与核因子κB在前列腺癌中的拮抗作用——实验与计算分析

Cancers (Basel). 2022 Dec 14;14(24):6164. doi: 10.3390/cancers14246164.

Performance of a scalable RNA extraction-free transcriptome profiling method for adherent cultured human cells.一种可扩展的无 RNA 提取的贴壁培养人细胞转录组分析方法的性能。

Sci Rep. 2021 Sep 30;11(1):19438. doi: 10.1038/s41598-021-98912-x.

A robust semi-supervised NMF model for single cell RNA-seq data.一种用于单细胞RNA测序数据的强大半监督非负矩阵分解模型。

PeerJ. 2020 Oct 16;8:e10091. doi: 10.7717/peerj.10091. eCollection 2020.

Paneth Cell-Derived Lysozyme Defines the Composition of Mucolytic Microbiota and the Inflammatory Tone of the Intestine.潘氏细胞衍生溶菌酶定义了粘液溶解微生物群的组成和肠道的炎症基调。

Immunity. 2020 Aug 18;53(2):398-416.e8. doi: 10.1016/j.immuni.2020.07.010.

Elevating EGFR-MAPK program by a nonconventional Cdc42 enhances intestinal epithelial survival and regeneration.通过非常规的 Cdc42 增强 EGFR-MAPK 程序可提高肠道上皮细胞的存活和再生。

JCI Insight. 2020 Aug 20;5(16):135923. doi: 10.1172/jci.insight.135923.

Upper tract urothelial carcinoma has a luminal-papillary T-cell depleted contexture and activated FGFR3 signaling.上尿路尿路上皮癌具有管腔-乳头 T 细胞耗竭的结构和激活的 FGFR3 信号通路。

Nat Commun. 2019 Jul 5;10(1):2977. doi: 10.1038/s41467-019-10873-y.

Cell Type-Specific Roles of NF-κB Linking Inflammation and Thrombosis.NF-κB 在炎症与血栓形成中的细胞类型特异性作用。

Front Immunol. 2019 Feb 4;10:85. doi: 10.3389/fimmu.2019.00085. eCollection 2019.

Hypoxia/reperfusion predisposes to atherosclerosis.缺氧/再灌注使动脉粥样硬化易于发生。

PLoS One. 2018 Oct 5;13(10):e0205067. doi: 10.1371/journal.pone.0205067. eCollection 2018.

本文引用的文献

jNMFMA: a joint non-negative matrix factorization meta-analysis of transcriptomics data.jNMFMA：转录组学数据的联合非负矩阵分解荟萃分析

Bioinformatics. 2015 Feb 15;31(4):572-80. doi: 10.1093/bioinformatics/btu679. Epub 2014 Oct 16.

A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium.测序质量控制联盟对RNA测序准确性、可重复性和信息含量的全面评估。

Nat Biotechnol. 2014 Sep;32(9):903-14. doi: 10.1038/nbt.2957. Epub 2014 Aug 24.

A new discriminant NMF algorithm and its application to the extraction of subtle emotional differences in speech.一种新的判别式 NMF 算法及其在语音细微情感差异提取中的应用。

Cogn Neurodyn. 2012 Dec;6(6):525-35. doi: 10.1007/s11571-012-9213-1. Epub 2012 Jul 21.

Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data.RNA测序数据差异基因表达分析方法的综合评估

Genome Biol. 2013;14(9):R95. doi: 10.1186/gb-2013-14-9-r95.

Normalizing RNA-sequencing data by modeling hidden covariates with prior knowledge.通过使用先验知识建模隐藏协变量对 RNA-seq 数据进行标准化。

PLoS One. 2013 Jul 18;8(7):e68141. doi: 10.1371/journal.pone.0068141. Print 2013.

Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool.Enrichr：交互式协作 HTML5 基因列表富集分析工具。

BMC Bioinformatics. 2013 Apr 15;14:128. doi: 10.1186/1471-2105-14-128.

The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote.Subread 比对工具：基于种子投票的快速、准确和可扩展的读段比对。

Nucleic Acids Res. 2013 May 1;41(10):e108. doi: 10.1093/nar/gkt214. Epub 2013 Apr 4.

Simultaneous non-negative matrix factorization for multiple large scale gene expression datasets in toxicology.毒理学中多个大规模基因表达数据集的同时非负矩阵分解。

PLoS One. 2012;7(12):e48238. doi: 10.1371/journal.pone.0048238. Epub 2012 Dec 14.

A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis.Illumina 高通量 RNA 测序数据分析中标准化方法的综合评估。

Brief Bioinform. 2013 Nov;14(6):671-83. doi: 10.1093/bib/bbs046. Epub 2012 Sep 17.

GFOLD: a generalized fold change for ranking differentially expressed genes from RNA-seq data.GFOLD：一种从 RNA-seq 数据中排名差异表达基因的广义倍数变化。

Bioinformatics. 2012 Nov 1;28(21):2782-8. doi: 10.1093/bioinformatics/bts515. Epub 2012 Aug 24.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于判别式非负矩阵分解的RNA测序数据基因排序

Gene Ranking of RNA-Seq Data via Discriminant Non-Negative Matrix Factorization.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献