Suppr超能文献

QUbic2:一种新颖而强大的用于大规模 RNA-Seq 数据分析和解释的双聚类算法。

QUBIC2: a novel and robust biclustering algorithm for analyses and interpretation of large-scale RNA-Seq data.

机构信息

Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH 43210, USA.

Colleges of Computer Science and Technology, Jilin University, Changchun 130012, China.

出版信息

Bioinformatics. 2020 Feb 15;36(4):1143-1149. doi: 10.1093/bioinformatics/btz692.

Abstract

MOTIVATION

The biclustering of large-scale gene expression data holds promising potential for detecting condition-specific functional gene modules (i.e. biclusters). However, existing methods do not adequately address a comprehensive detection of all significant bicluster structures and have limited power when applied to expression data generated by RNA-Sequencing (RNA-Seq), especially single-cell RNA-Seq (scRNA-Seq) data, where massive zero and low expression values are observed.

RESULTS

We present a new biclustering algorithm, QUalitative BIClustering algorithm Version 2 (QUBIC2), which is empowered by: (i) a novel left-truncated mixture of Gaussian model for an accurate assessment of multimodality in zero-enriched expression data, (ii) a fast and efficient dropouts-saving expansion strategy for functional gene modules optimization using information divergency and (iii) a rigorous statistical test for the significance of all the identified biclusters in any organism, including those without substantial functional annotations. QUBIC2 demonstrated considerably improved performance in detecting biclusters compared to other five widely used algorithms on various benchmark datasets from E.coli, Human and simulated data. QUBIC2 also showcased robust and superior performance on gene expression data generated by microarray, bulk RNA-Seq and scRNA-Seq.

AVAILABILITY AND IMPLEMENTATION

The source code of QUBIC2 is freely available at https://github.com/OSU-BMBL/QUBIC2.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

大规模基因表达数据的双聚类在检测条件特异性功能基因模块(即双聚类)方面具有很大的潜力。然而,现有的方法不能充分地全面检测所有显著的双聚类结构,并且当应用于 RNA 测序(RNA-Seq)生成的表达数据时,尤其是单细胞 RNA 测序(scRNA-Seq)数据时,其能力有限,因为在这些数据中观察到大量的零和低表达值。

结果

我们提出了一种新的双聚类算法 QUalitative BIClustering algorithm Version 2(QUBIC2),它具有以下特点:(i)一种新的左截断混合高斯模型,用于准确评估富含零值的表达数据中的多模态,(ii)一种快速有效的节省缺失值的扩展策略,用于使用信息分歧优化功能基因模块,以及(iii)一种严格的统计检验,用于检验任何生物体中所有识别出的双聚类的显著性,包括那些没有实质性功能注释的双聚类。与其他五种广泛使用的算法相比,QUBIC2 在各种基准数据集(包括大肠杆菌、人类和模拟数据)上检测双聚类的性能有了显著提高。QUBIC2 在微阵列、批量 RNA-Seq 和 scRNA-Seq 生成的基因表达数据上也表现出了强大而优越的性能。

可用性和实现

QUBIC2 的源代码可在 https://github.com/OSU-BMBL/QUBIC2 上免费获得。

补充信息

补充数据可在生物信息学在线获得。

相似文献

1
QUBIC2: a novel and robust biclustering algorithm for analyses and interpretation of large-scale RNA-Seq data.
Bioinformatics. 2020 Feb 15;36(4):1143-1149. doi: 10.1093/bioinformatics/btz692.
2
IRIS-FGM: an integrative single-cell RNA-Seq interpretation system for functional gene module analysis.
Bioinformatics. 2021 Sep 29;37(18):3045-3047. doi: 10.1093/bioinformatics/btab108.
3
QUBIC: a bioconductor package for qualitative biclustering analysis of gene co-expression data.
Bioinformatics. 2017 Feb 1;33(3):450-452. doi: 10.1093/bioinformatics/btw635.
5
RecBic: a fast and accurate algorithm recognizing trend-preserving biclusters.
Bioinformatics. 2020 Dec 22;36(20):5054-5060. doi: 10.1093/bioinformatics/btaa630.
6
Comparison of sparse biclustering algorithms for gene expression datasets.
Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab140.
7
QUBIC: a qualitative biclustering algorithm for analyses of gene expression data.
Nucleic Acids Res. 2009 Aug;37(15):e101. doi: 10.1093/nar/gkp491. Epub 2009 Jun 9.
8
ZIAQ: a quantile regression method for differential expression analysis of single-cell RNA-seq data.
Bioinformatics. 2020 May 1;36(10):3124-3130. doi: 10.1093/bioinformatics/btaa098.
9
Cell-level somatic mutation detection from single-cell RNA sequencing.
Bioinformatics. 2019 Nov 1;35(22):4679-4687. doi: 10.1093/bioinformatics/btz288.
10
Scalable preprocessing for sparse scRNA-seq data exploiting prior knowledge.
Bioinformatics. 2018 Jul 1;34(13):i124-i132. doi: 10.1093/bioinformatics/bty293.

引用本文的文献

5
Reconstructing the regulatory programs underlying the phenotypic plasticity of neural cancers.
Nat Commun. 2024 Nov 9;15(1):9699. doi: 10.1038/s41467-024-53954-3.
6
A parameter free relative density based biclustering method for identifying non-linear feature relations.
Heliyon. 2024 Jul 20;10(15):e34736. doi: 10.1016/j.heliyon.2024.e34736. eCollection 2024 Aug 15.
7
Biclustering data analysis: a comprehensive survey.
Brief Bioinform. 2024 May 23;25(4). doi: 10.1093/bib/bbae342.
8
Biclustering for Epi-Transcriptomic Co-functional Analysis.
Methods Mol Biol. 2024;2822:293-309. doi: 10.1007/978-1-0716-3918-4_19.
9
CAbiNet: joint clustering and visualization of cells and genes for single-cell transcriptomics.
Nucleic Acids Res. 2024 Jul 22;52(13):e57. doi: 10.1093/nar/gkae480.
10
Biclustering analysis on tree-shaped time-series single cell gene expression data of Caenorhabditis elegans.
BMC Bioinformatics. 2024 May 9;25(1):183. doi: 10.1186/s12859-024-05800-y.

本文引用的文献

1
LTMG: a novel statistical modeling of transcriptional expression states in single-cell RNA-Seq data.
Nucleic Acids Res. 2019 Oct 10;47(18):e111. doi: 10.1093/nar/gkz655.
2
IRIS-EDA: An integrated RNA-Seq interpretation system for gene expression data analysis.
PLoS Comput Biol. 2019 Feb 14;15(2):e1006792. doi: 10.1371/journal.pcbi.1006792. eCollection 2019 Feb.
3
Comparison of clustering tools in R for medium-sized 10x Genomics single-cell RNA-sequencing data.
F1000Res. 2018 Aug 15;7:1297. doi: 10.12688/f1000research.15809.2. eCollection 2018.
4
Single-cell RNA sequencing technologies and bioinformatics pipelines.
Exp Mol Med. 2018 Aug 7;50(8):1-14. doi: 10.1038/s12276-018-0071-8.
5
EBIC: an evolutionary-based parallel biclustering algorithm for pattern discovery.
Bioinformatics. 2018 Nov 1;34(21):3719-3726. doi: 10.1093/bioinformatics/bty401.
6
A comprehensive evaluation of module detection methods for gene expression data.
Nat Commun. 2018 Mar 15;9(1):1090. doi: 10.1038/s41467-018-03424-4.
8
SC3: consensus clustering of single-cell RNA-seq data.
Nat Methods. 2017 May;14(5):483-486. doi: 10.1038/nmeth.4236. Epub 2017 Mar 27.
9
Seq-Well: portable, low-cost RNA sequencing of single cells at high throughput.
Nat Methods. 2017 Apr;14(4):395-398. doi: 10.1038/nmeth.4179. Epub 2017 Feb 13.
10
QUBIC: a bioconductor package for qualitative biclustering analysis of gene co-expression data.
Bioinformatics. 2017 Feb 1;33(3):450-452. doi: 10.1093/bioinformatics/btw635.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验