Suppr超能文献

基于基因表达数据的对比分析双聚类算法。

A comparative analysis of biclustering algorithms for gene expression data.

机构信息

Department of Computer Science and Engineering, The Ohio State University, 3165 Graves Hall 333 West 10th Avenue. Columbus, OH 43210, USA.

出版信息

Brief Bioinform. 2013 May;14(3):279-92. doi: 10.1093/bib/bbs032. Epub 2012 Jul 6.

Abstract

The need to analyze high-dimension biological data is driving the development of new data mining methods. Biclustering algorithms have been successfully applied to gene expression data to discover local patterns, in which a subset of genes exhibit similar expression levels over a subset of conditions. However, it is not clear which algorithms are best suited for this task. Many algorithms have been published in the past decade, most of which have been compared only to a small number of algorithms. Surveys and comparisons exist in the literature, but because of the large number and variety of biclustering algorithms, they are quickly outdated. In this article we partially address this problem of evaluating the strengths and weaknesses of existing biclustering methods. We used the BiBench package to compare 12 algorithms, many of which were recently published or have not been extensively studied. The algorithms were tested on a suite of synthetic data sets to measure their performance on data with varying conditions, such as different bicluster models, varying noise, varying numbers of biclusters and overlapping biclusters. The algorithms were also tested on eight large gene expression data sets obtained from the Gene Expression Omnibus. Gene Ontology enrichment analysis was performed on the resulting biclusters, and the best enrichment terms are reported. Our analyses show that the biclustering method and its parameters should be selected based on the desired model, whether that model allows overlapping biclusters, and its robustness to noise. In addition, we observe that the biclustering algorithms capable of finding more than one model are more successful at capturing biologically relevant clusters.

摘要

分析高维生物数据的需求推动了新的数据挖掘方法的发展。分簇算法已成功应用于基因表达数据,以发现局部模式,其中一组基因在一组条件下表现出相似的表达水平。然而,目前还不清楚哪种算法最适合这项任务。过去十年中已经发布了许多算法,其中大多数算法仅与少数几种算法进行了比较。文献中存在调查和比较,但由于分簇算法的数量众多且种类繁多,它们很快就过时了。在本文中,我们部分解决了评估现有分簇方法的优缺点的问题。我们使用 BiBench 包比较了 12 种算法,其中许多是最近发布的或尚未广泛研究的算法。这些算法在一系列合成数据集上进行了测试,以衡量它们在不同条件下(例如不同的分簇模型、不同的噪声、不同数量的分簇和重叠分簇)的数据上的性能。这些算法还在从基因表达综合数据库获得的八个大型基因表达数据集上进行了测试。对生成的分簇进行了基因本体富集分析,并报告了最佳的富集术语。我们的分析表明,分簇方法及其参数应根据所需的模型、模型是否允许重叠分簇以及其对噪声的鲁棒性来选择。此外,我们观察到能够找到多个模型的分簇算法更成功地捕获了具有生物学意义的簇。

相似文献

3
Discovery of error-tolerant biclusters from noisy gene expression data.从嘈杂的基因表达数据中发现容错双聚类。
BMC Bioinformatics. 2011 Nov 24;12 Suppl 12(Suppl 12):S1. doi: 10.1186/1471-2105-12-S12-S1.
8
A graph spectrum based geometric biclustering algorithm.基于图谱的几何二分聚类算法。
J Theor Biol. 2013 Jan 21;317:200-11. doi: 10.1016/j.jtbi.2012.10.012. Epub 2012 Oct 16.

引用本文的文献

6
Biclustering data analysis: a comprehensive survey.双聚类数据分析:全面综述。
Brief Bioinform. 2024 May 23;25(4). doi: 10.1093/bib/bbae342.
8
G-bic: generating synthetic benchmarks for biclustering.G-bic:生成用于分群分析的合成基准。
BMC Bioinformatics. 2023 Dec 6;24(1):457. doi: 10.1186/s12859-023-05587-4.

本文引用的文献

2
FABIA: factor analysis for bicluster acquisition.FABIA:双聚类因子分析。
Bioinformatics. 2010 Jun 15;26(12):1520-7. doi: 10.1093/bioinformatics/btq227. Epub 2010 Apr 23.
3
Detailing regulatory networks through large scale data integration.通过大规模数据集成来详细描述调控网络。
Bioinformatics. 2009 Dec 15;25(24):3267-74. doi: 10.1093/bioinformatics/btp588. Epub 2009 Oct 13.
6
Bayesian biclustering of gene expression data.基因表达数据的贝叶斯双聚类分析
BMC Genomics. 2008;9 Suppl 1(Suppl 1):S4. doi: 10.1186/1471-2164-9-S1-S4.
10
Using GOstats to test gene lists for GO term association.使用GOstats测试基因列表与GO术语的关联性。
Bioinformatics. 2007 Jan 15;23(2):257-8. doi: 10.1093/bioinformatics/btl567. Epub 2006 Nov 10.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验