Suppr超能文献

QUbic2:一种新颖而强大的用于大规模 RNA-Seq 数据分析和解释的双聚类算法。

QUBIC2: a novel and robust biclustering algorithm for analyses and interpretation of large-scale RNA-Seq data.

机构信息

Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH 43210, USA.

Colleges of Computer Science and Technology, Jilin University, Changchun 130012, China.

出版信息

Bioinformatics. 2020 Feb 15;36(4):1143-1149. doi: 10.1093/bioinformatics/btz692.

Abstract

MOTIVATION

The biclustering of large-scale gene expression data holds promising potential for detecting condition-specific functional gene modules (i.e. biclusters). However, existing methods do not adequately address a comprehensive detection of all significant bicluster structures and have limited power when applied to expression data generated by RNA-Sequencing (RNA-Seq), especially single-cell RNA-Seq (scRNA-Seq) data, where massive zero and low expression values are observed.

RESULTS

We present a new biclustering algorithm, QUalitative BIClustering algorithm Version 2 (QUBIC2), which is empowered by: (i) a novel left-truncated mixture of Gaussian model for an accurate assessment of multimodality in zero-enriched expression data, (ii) a fast and efficient dropouts-saving expansion strategy for functional gene modules optimization using information divergency and (iii) a rigorous statistical test for the significance of all the identified biclusters in any organism, including those without substantial functional annotations. QUBIC2 demonstrated considerably improved performance in detecting biclusters compared to other five widely used algorithms on various benchmark datasets from E.coli, Human and simulated data. QUBIC2 also showcased robust and superior performance on gene expression data generated by microarray, bulk RNA-Seq and scRNA-Seq.

AVAILABILITY AND IMPLEMENTATION

The source code of QUBIC2 is freely available at https://github.com/OSU-BMBL/QUBIC2.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

大规模基因表达数据的双聚类在检测条件特异性功能基因模块(即双聚类)方面具有很大的潜力。然而,现有的方法不能充分地全面检测所有显著的双聚类结构,并且当应用于 RNA 测序(RNA-Seq)生成的表达数据时,尤其是单细胞 RNA 测序(scRNA-Seq)数据时,其能力有限,因为在这些数据中观察到大量的零和低表达值。

结果

我们提出了一种新的双聚类算法 QUalitative BIClustering algorithm Version 2(QUBIC2),它具有以下特点:(i)一种新的左截断混合高斯模型,用于准确评估富含零值的表达数据中的多模态,(ii)一种快速有效的节省缺失值的扩展策略,用于使用信息分歧优化功能基因模块,以及(iii)一种严格的统计检验,用于检验任何生物体中所有识别出的双聚类的显著性,包括那些没有实质性功能注释的双聚类。与其他五种广泛使用的算法相比,QUBIC2 在各种基准数据集(包括大肠杆菌、人类和模拟数据)上检测双聚类的性能有了显著提高。QUBIC2 在微阵列、批量 RNA-Seq 和 scRNA-Seq 生成的基因表达数据上也表现出了强大而优越的性能。

可用性和实现

QUBIC2 的源代码可在 https://github.com/OSU-BMBL/QUBIC2 上免费获得。

补充信息

补充数据可在生物信息学在线获得。

相似文献

引用本文的文献

7
Biclustering data analysis: a comprehensive survey.双聚类数据分析:全面综述。
Brief Bioinform. 2024 May 23;25(4). doi: 10.1093/bib/bbae342.

本文引用的文献

8
SC3: consensus clustering of single-cell RNA-seq data.SC3:单细胞RNA测序数据的一致性聚类
Nat Methods. 2017 May;14(5):483-486. doi: 10.1038/nmeth.4236. Epub 2017 Mar 27.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验