QUbic2：一种新颖而强大的用于大规模 RNA-Seq 数据分析和解释的双聚类算法。

QUBIC2: a novel and robust biclustering algorithm for analyses and interpretation of large-scale RNA-Seq data.

机构信息

Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH 43210, USA.

Colleges of Computer Science and Technology, Jilin University, Changchun 130012, China.

出版信息

Bioinformatics. 2020 Feb 15;36(4):1143-1149. doi: 10.1093/bioinformatics/btz692.

DOI:10.1093/bioinformatics/btz692

PMID:31503285

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8215922/

Abstract

MOTIVATION

The biclustering of large-scale gene expression data holds promising potential for detecting condition-specific functional gene modules (i.e. biclusters). However, existing methods do not adequately address a comprehensive detection of all significant bicluster structures and have limited power when applied to expression data generated by RNA-Sequencing (RNA-Seq), especially single-cell RNA-Seq (scRNA-Seq) data, where massive zero and low expression values are observed.

RESULTS

We present a new biclustering algorithm, QUalitative BIClustering algorithm Version 2 (QUBIC2), which is empowered by: (i) a novel left-truncated mixture of Gaussian model for an accurate assessment of multimodality in zero-enriched expression data, (ii) a fast and efficient dropouts-saving expansion strategy for functional gene modules optimization using information divergency and (iii) a rigorous statistical test for the significance of all the identified biclusters in any organism, including those without substantial functional annotations. QUBIC2 demonstrated considerably improved performance in detecting biclusters compared to other five widely used algorithms on various benchmark datasets from E.coli, Human and simulated data. QUBIC2 also showcased robust and superior performance on gene expression data generated by microarray, bulk RNA-Seq and scRNA-Seq.

AVAILABILITY AND IMPLEMENTATION

The source code of QUBIC2 is freely available at https://github.com/OSU-BMBL/QUBIC2.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

大规模基因表达数据的双聚类在检测条件特异性功能基因模块（即双聚类）方面具有很大的潜力。然而，现有的方法不能充分地全面检测所有显著的双聚类结构，并且当应用于 RNA 测序（RNA-Seq）生成的表达数据时，尤其是单细胞 RNA 测序（scRNA-Seq）数据时，其能力有限，因为在这些数据中观察到大量的零和低表达值。

结果

我们提出了一种新的双聚类算法 QUalitative BIClustering algorithm Version 2（QUBIC2），它具有以下特点：（i）一种新的左截断混合高斯模型，用于准确评估富含零值的表达数据中的多模态，（ii）一种快速有效的节省缺失值的扩展策略，用于使用信息分歧优化功能基因模块，以及（iii）一种严格的统计检验，用于检验任何生物体中所有识别出的双聚类的显著性，包括那些没有实质性功能注释的双聚类。与其他五种广泛使用的算法相比，QUBIC2 在各种基准数据集（包括大肠杆菌、人类和模拟数据）上检测双聚类的性能有了显著提高。QUBIC2 在微阵列、批量 RNA-Seq 和 scRNA-Seq 生成的基因表达数据上也表现出了强大而优越的性能。

可用性和实现

QUBIC2 的源代码可在 https://github.com/OSU-BMBL/QUBIC2 上免费获得。

补充信息

补充数据可在生物信息学在线获得。

相似文献

QUBIC2: a novel and robust biclustering algorithm for analyses and interpretation of large-scale RNA-Seq data.QUbic2：一种新颖而强大的用于大规模 RNA-Seq 数据分析和解释的双聚类算法。

Bioinformatics. 2020 Feb 15;36(4):1143-1149. doi: 10.1093/bioinformatics/btz692.

IRIS-FGM: an integrative single-cell RNA-Seq interpretation system for functional gene module analysis.IRIS-FGM：用于功能基因模块分析的综合单细胞 RNA-Seq 解读系统。

Bioinformatics. 2021 Sep 29;37(18):3045-3047. doi: 10.1093/bioinformatics/btab108.

QUBIC: a bioconductor package for qualitative biclustering analysis of gene co-expression data.QUBiC：一个用于基因共表达数据的定性双聚类分析的 Bioconductor 包。

Bioinformatics. 2017 Feb 1;33(3):450-452. doi: 10.1093/bioinformatics/btw635.

scBGEDA: deep single-cell clustering analysis via a dual denoising autoencoder with bipartite graph ensemble clustering.scBGEDA：基于双分图集成分聚类的对偶去噪自动编码器的单细胞聚类分析。

Bioinformatics. 2023 Feb 14;39(2). doi: 10.1093/bioinformatics/btad075.

RecBic: a fast and accurate algorithm recognizing trend-preserving biclusters.RecBic：一种快速准确的保持趋势的双聚类识别算法。

Bioinformatics. 2020 Dec 22;36(20):5054-5060. doi: 10.1093/bioinformatics/btaa630.

Comparison of sparse biclustering algorithms for gene expression datasets.基因表达数据集的稀疏双聚类算法比较。

Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab140.

QUBIC: a qualitative biclustering algorithm for analyses of gene expression data.QUBIC：一种用于基因表达数据分析的定性双聚类算法。

Nucleic Acids Res. 2009 Aug;37(15):e101. doi: 10.1093/nar/gkp491. Epub 2009 Jun 9.

ZIAQ: a quantile regression method for differential expression analysis of single-cell RNA-seq data.ZIAQ：一种用于单细胞 RNA-seq 数据差异表达分析的分位数回归方法。

Bioinformatics. 2020 May 1;36(10):3124-3130. doi: 10.1093/bioinformatics/btaa098.

Cell-level somatic mutation detection from single-cell RNA sequencing.单细胞 RNA 测序中单细胞体细胞突变检测

Bioinformatics. 2019 Nov 1;35(22):4679-4687. doi: 10.1093/bioinformatics/btz288.

Scalable preprocessing for sparse scRNA-seq data exploiting prior knowledge.利用先验知识对稀疏 scRNA-seq 数据进行可扩展的预处理。

Bioinformatics. 2018 Jul 1;34(13):i124-i132. doi: 10.1093/bioinformatics/bty293.

引用本文的文献

Outcome-guided spike-and-slab Lasso Biclustering: A Novel Approach for Enhancing Biclustering Techniques for Gene Expression Analysis.结果导向的尖峰和平板套索双聚类：一种增强基因表达分析双聚类技术的新方法。

Stat Comput. 2025;35(6):179. doi: 10.1007/s11222-025-10709-4. Epub 2025 Aug 28.

A survey of biclustering and clustering methods in clustering different types of single-cell RNA sequencing data.关于在对不同类型的单细胞RNA测序数据进行聚类时的双聚类和聚类方法的一项调查。

Brief Funct Genomics. 2025 Jan 15;24. doi: 10.1093/bfgp/elaf010.

TransBic: bucket trend-preserving biclustering for finding local and interpretable expression patterns.TransBic：用于发现局部且可解释的表达模式的桶趋势保留双聚类

Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbaf050.

Online-adjusted evolutionary biclustering algorithm to identify significant modules in gene expression data.用于识别基因表达数据中显著模块的在线调整进化双聚类算法。

Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbae681.

Reconstructing the regulatory programs underlying the phenotypic plasticity of neural cancers.重建神经癌表型可塑性的调控程序。

Nat Commun. 2024 Nov 9;15(1):9699. doi: 10.1038/s41467-024-53954-3.

A parameter free relative density based biclustering method for identifying non-linear feature relations.一种基于无参数相对密度的双聚类方法，用于识别非线性特征关系。

Heliyon. 2024 Jul 20;10(15):e34736. doi: 10.1016/j.heliyon.2024.e34736. eCollection 2024 Aug 15.

Biclustering data analysis: a comprehensive survey.双聚类数据分析：全面综述。

Brief Bioinform. 2024 May 23;25(4). doi: 10.1093/bib/bbae342.

Biclustering for Epi-Transcriptomic Co-functional Analysis.基于组学数据的共功能分析的双聚类。

Methods Mol Biol. 2024;2822:293-309. doi: 10.1007/978-1-0716-3918-4_19.

CAbiNet: joint clustering and visualization of cells and genes for single-cell transcriptomics.CAbiNet：用于单细胞转录组学的细胞和基因联合聚类和可视化。

Nucleic Acids Res. 2024 Jul 22;52(13):e57. doi: 10.1093/nar/gkae480.

Biclustering analysis on tree-shaped time-series single cell gene expression data of Caenorhabditis elegans.基于树状时间序列秀丽隐杆线虫单细胞基因表达数据的双聚类分析。

BMC Bioinformatics. 2024 May 9;25(1):183. doi: 10.1186/s12859-024-05800-y.

本文引用的文献

LTMG: a novel statistical modeling of transcriptional expression states in single-cell RNA-Seq data.LTMG：一种单细胞 RNA-Seq 数据中转录表达状态的新型统计建模方法。

Nucleic Acids Res. 2019 Oct 10;47(18):e111. doi: 10.1093/nar/gkz655.

IRIS-EDA: An integrated RNA-Seq interpretation system for gene expression data analysis.IRIS-EDA：一个用于基因表达数据分析的集成 RNA-Seq 解读系统。

PLoS Comput Biol. 2019 Feb 14;15(2):e1006792. doi: 10.1371/journal.pcbi.1006792. eCollection 2019 Feb.

Comparison of clustering tools in R for medium-sized 10x Genomics single-cell RNA-sequencing data.用于中等规模10x基因组学单细胞RNA测序数据的R语言聚类工具比较

F1000Res. 2018 Aug 15;7:1297. doi: 10.12688/f1000research.15809.2. eCollection 2018.

Single-cell RNA sequencing technologies and bioinformatics pipelines.单细胞 RNA 测序技术和生物信息学分析流程。

Exp Mol Med. 2018 Aug 7;50(8):1-14. doi: 10.1038/s12276-018-0071-8.

EBIC: an evolutionary-based parallel biclustering algorithm for pattern discovery.EBIC：一种基于进化的并行关联聚类算法，用于发现模式。

Bioinformatics. 2018 Nov 1;34(21):3719-3726. doi: 10.1093/bioinformatics/bty401.

A comprehensive evaluation of module detection methods for gene expression data.基因表达数据模块检测方法的综合评估

Nat Commun. 2018 Mar 15;9(1):1090. doi: 10.1038/s41467-018-03424-4.

It is time to apply biclustering: a comprehensive review of biclustering applications in biological and biomedical data.是时候应用双聚类了：对生物和生物医学数据中双聚类应用的全面综述。

Brief Bioinform. 2019 Jul 19;20(4):1449-1464. doi: 10.1093/bib/bby014.

SC3: consensus clustering of single-cell RNA-seq data.SC3：单细胞RNA测序数据的一致性聚类

Nat Methods. 2017 May;14(5):483-486. doi: 10.1038/nmeth.4236. Epub 2017 Mar 27.

Seq-Well: portable, low-cost RNA sequencing of single cells at high throughput.Seq-Well：高通量、便携式、低成本的单细胞 RNA 测序。

Nat Methods. 2017 Apr;14(4):395-398. doi: 10.1038/nmeth.4179. Epub 2017 Feb 13.

QUBIC: a bioconductor package for qualitative biclustering analysis of gene co-expression data.QUBiC：一个用于基因共表达数据的定性双聚类分析的 Bioconductor 包。

Bioinformatics. 2017 Feb 1;33(3):450-452. doi: 10.1093/bioinformatics/btw635.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。