使用高效双聚类算法和并行坐标可视化技术识别基因表达数据中的连贯模式。

Identification of coherent patterns in gene expression data using an efficient biclustering algorithm and parallel coordinate visualization.

作者信息

Cheng Kin-On, Law Ngai-Fong, Siu Wan-Chi, Liew Alan Wee-Chung

机构信息

School of Information and Communication Technology, Griffith University, Gold Coast Campus, QLD 4222, Queensland, Australia.

出版信息

BMC Bioinformatics. 2008 Apr 23;9:210. doi: 10.1186/1471-2105-9-210.

DOI:10.1186/1471-2105-9-210

PMID:18433478

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2396181/

Abstract

BACKGROUND

The DNA microarray technology allows the measurement of expression levels of thousands of genes under tens/hundreds of different conditions. In microarray data, genes with similar functions usually co-express under certain conditions only 1. Thus, biclustering which clusters genes and conditions simultaneously is preferred over the traditional clustering technique in discovering these coherent genes. Various biclustering algorithms have been developed using different bicluster formulations. Unfortunately, many useful formulations result in NP-complete problems. In this article, we investigate an efficient method for identifying a popular type of biclusters called additive model. Furthermore, parallel coordinate (PC) plots are used for bicluster visualization and analysis.

RESULTS

We develop a novel and efficient biclustering algorithm which can be regarded as a greedy version of an existing algorithm known as pCluster algorithm. By relaxing the constraint in homogeneity, the proposed algorithm has polynomial-time complexity in the worst case instead of exponential-time complexity as in the pCluster algorithm. Experiments on artificial datasets verify that our algorithm can identify both additive-related and multiplicative-related biclusters in the presence of overlap and noise. Biologically significant biclusters have been validated on the yeast cell-cycle expression dataset using Gene Ontology annotations. Comparative study shows that the proposed approach outperforms several existing biclustering algorithms. We also provide an interactive exploratory tool based on PC plot visualization for determining the parameters of our biclustering algorithm.

CONCLUSION

We have proposed a novel biclustering algorithm which works with PC plots for an interactive exploratory analysis of gene expression data. Experiments show that the biclustering algorithm is efficient and is capable of detecting co-regulated genes. The interactive analysis enables an optimum parameter determination in the biclustering algorithm so as to achieve the best result. In future, we will modify the proposed algorithm for other bicluster models such as the coherent evolution model.

摘要

背景

DNA微阵列技术能够在数十种/数百种不同条件下测量数千个基因的表达水平。在微阵列数据中，具有相似功能的基因通常仅在特定条件下共表达。因此，在发现这些共表达基因方面，同时对基因和条件进行聚类的双聚类方法优于传统聚类技术。已经使用不同的双聚类公式开发了各种双聚类算法。不幸的是，许多有用的公式会导致NP完全问题。在本文中，我们研究了一种识别一种流行的双聚类类型（称为加法模型）的有效方法。此外，平行坐标（PC）图用于双聚类的可视化和分析。

结果

我们开发了一种新颖且高效的双聚类算法，它可以被视为一种现有算法（称为pCluster算法）的贪心版本。通过放宽同质性约束，所提出的算法在最坏情况下具有多项式时间复杂度，而不像pCluster算法那样具有指数时间复杂度。在人工数据集上的实验验证了我们的算法能够在存在重叠和噪声的情况下识别与加法相关和与乘法相关的双聚类。使用基因本体注释在酵母细胞周期表达数据集上验证了具有生物学意义的双聚类。比较研究表明，所提出的方法优于几种现有的双聚类算法。我们还提供了一个基于PC图可视化的交互式探索工具，用于确定我们双聚类算法的参数。

结论

我们提出了一种新颖的双聚类算法，它与PC图一起用于基因表达数据的交互式探索分析。实验表明，该双聚类算法是有效的，并且能够检测共调控基因。交互式分析能够在双聚类算法中确定最佳参数以获得最佳结果。未来，我们将针对其他双聚类模型（如相干进化模型）修改所提出的算法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f794/2396181/de3744c3ec3a/1471-2105-9-210-1.jpg

相似文献

Identification of coherent patterns in gene expression data using an efficient biclustering algorithm and parallel coordinate visualization.

BMC Bioinformatics. 2008 Apr 23;9:210. doi: 10.1186/1471-2105-9-210.

BiVisu: software tool for bicluster detection and visualization.

Bioinformatics. 2007 Sep 1;23(17):2342-4. doi: 10.1093/bioinformatics/btm338. Epub 2007 Jun 22.

Discovering biclusters in gene expression data based on high-dimensional linear geometries.

BMC Bioinformatics. 2008 Apr 23;9:209. doi: 10.1186/1471-2105-9-209.

Parallelized evolutionary learning for detection of biclusters in gene expression data.

IEEE/ACM Trans Comput Biol Bioinform. 2012;9(2):560-70. doi: 10.1109/TCBB.2011.53. Epub 2011 Mar 3.

Recent patents on biclustering algorithms for gene expression data analysis.

Recent Pat DNA Gene Seq. 2011 Aug;5(2):117-25. doi: 10.2174/187221511796392097.

A general framework for biclustering gene expression data.

J Bioinform Comput Biol. 2006 Aug;4(4):911-33. doi: 10.1142/s021972000600217x.

Application of simulated annealing to the biclustering of gene expression data.

IEEE Trans Inf Technol Biomed. 2006 Jul;10(3):519-25. doi: 10.1109/titb.2006.872073.

VisBicluster: A Matrix-Based Bicluster Visualization of Expression Data.

J Comput Biol. 2020 Sep;27(9):1384-1396. doi: 10.1089/cmb.2019.0385. Epub 2020 Feb 7.

QUBIC: a qualitative biclustering algorithm for analyses of gene expression data.

Nucleic Acids Res. 2009 Aug;37(15):e101. doi: 10.1093/nar/gkp491. Epub 2009 Jun 9.

Discovery of error-tolerant biclusters from noisy gene expression data.

BMC Bioinformatics. 2011 Nov 24;12 Suppl 12(Suppl 12):S1. doi: 10.1186/1471-2105-12-S12-S1.

引用本文的文献

RUBic: rapid unsupervised biclustering.

BMC Bioinformatics. 2023 Nov 16;24(1):435. doi: 10.1186/s12859-023-05534-3.

A binary biclustering algorithm based on the adjacency difference matrix for gene expression data analysis.

BMC Bioinformatics. 2022 Sep 19;23(1):381. doi: 10.1186/s12859-022-04842-4.

DRAXIN as a Novel Diagnostic Marker to Predict the Poor Prognosis of Glioma Patients.

J Mol Neurosci. 2022 Oct;72(10):2136-2149. doi: 10.1007/s12031-022-02054-2. Epub 2022 Aug 30.

COSCEB: Comprehensive search for column-coherent evolution biclusters and its application to hub gene identification.

J Biosci. 2019 Jun;44(2).

Factor Analysis of MYB Gene Expression and Flavonoid Affecting Petal Color in Three Crabapple Cultivars.

Front Plant Sci. 2017 Feb 7;8:137. doi: 10.3389/fpls.2017.00137. eCollection 2017.

Identifying Multi-Dimensional Co-Clusters in Tensors Based on Hyperplane Detection in Singular Vector Spaces.

PLoS One. 2016 Sep 6;11(9):e0162293. doi: 10.1371/journal.pone.0162293. eCollection 2016.

A novel biclustering algorithm of binary microarray data: BiBinCons and BiBinAlter.

BioData Min. 2015 Nov 30;8:38. doi: 10.1186/s13040-015-0070-4. eCollection 2015.

Furby: fuzzy force-directed bicluster visualization.

BMC Bioinformatics. 2014;15 Suppl 6(Suppl 6):S4. doi: 10.1186/1471-2105-15-S6-S4. Epub 2014 May 16.

Identification of bicluster regions in a binary matrix and its applications.

PLoS One. 2013 Aug 5;8(8):e71680. doi: 10.1371/journal.pone.0071680. Print 2013.

Seed-based biclustering of gene expression data.

PLoS One. 2012;7(8):e42431. doi: 10.1371/journal.pone.0042431. Epub 2012 Aug 3.

本文引用的文献

Discovering biclusters in gene expression data based on high-dimensional linear geometries.

BMC Bioinformatics. 2008 Apr 23;9:209. doi: 10.1186/1471-2105-9-209.

A new geometric biclustering algorithm based on the Hough transform for analysis of large-scale microarray data.

J Theor Biol. 2008 Mar 21;251(2):264-74. doi: 10.1016/j.jtbi.2007.11.030. Epub 2007 Dec 4.

Visualization of microarray gene expression data.

Bioinformation. 2006 May 3;1(4):141-5. doi: 10.6026/97320630001141.

BiVisu: software tool for bicluster detection and visualization.

Bioinformatics. 2007 Sep 1;23(17):2342-4. doi: 10.1093/bioinformatics/btm338. Epub 2007 Jun 22.

Biclustering algorithms for biological data analysis: a survey.

IEEE/ACM Trans Comput Biol Bioinform. 2004 Jan-Mar;1(1):24-45. doi: 10.1109/TCBB.2004.2.

Discovering coherent biclusters from gene expression data using zero-suppressed binary decision diagrams.

IEEE/ACM Trans Comput Biol Bioinform. 2005 Oct-Dec;2(4):339-54. doi: 10.1109/TCBB.2005.55.

BicAT: a biclustering analysis toolbox.

Bioinformatics. 2006 May 15;22(10):1282-3. doi: 10.1093/bioinformatics/btl099. Epub 2006 Mar 21.

A systematic comparison and evaluation of biclustering methods for gene expression data.

Bioinformatics. 2006 May 1;22(9):1122-9. doi: 10.1093/bioinformatics/btl060. Epub 2006 Feb 24.

Cluster analysis of gene expression data based on self-splitting and merging competitive learning.

IEEE Trans Inf Technol Biomed. 2004 Mar;8(1):5-15. doi: 10.1109/titb.2004.824724.

Defining transcription modules using large-scale gene expression data.

Bioinformatics. 2004 Sep 1;20(13):1993-2003. doi: 10.1093/bioinformatics/bth166. Epub 2004 Mar 25.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

使用高效双聚类算法和并行坐标可视化技术识别基因表达数据中的连贯模式。

Identification of coherent patterns in gene expression data using an efficient biclustering algorithm and parallel coordinate visualization.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献