一种用于在基因表达时间序列中寻找近似表达模式的多项式时间双聚类算法。

A polynomial time biclustering algorithm for finding approximate expression patterns in gene expression time series.

作者信息

Madeira Sara C, Oliveira Arlindo L

机构信息

Knowledge Discovery and Bioinformatics (KDBIO) group, INESC-ID, Lisbon, Portugal.

出版信息

Algorithms Mol Biol. 2009 Jun 4;4:8. doi: 10.1186/1748-7188-4-8.

DOI:10.1186/1748-7188-4-8

PMID:19497096

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2709627/

Abstract

BACKGROUND

The ability to monitor the change in expression patterns over time, and to observe the emergence of coherent temporal responses using gene expression time series, obtained from microarray experiments, is critical to advance our understanding of complex biological processes. In this context, biclustering algorithms have been recognized as an important tool for the discovery of local expression patterns, which are crucial to unravel potential regulatory mechanisms. Although most formulations of the biclustering problem are NP-hard, when working with time series expression data the interesting biclusters can be restricted to those with contiguous columns. This restriction leads to a tractable problem and enables the design of efficient biclustering algorithms able to identify all maximal contiguous column coherent biclusters.

METHODS

In this work, we propose e-CCC-Biclustering, a biclustering algorithm that finds and reports all maximal contiguous column coherent biclusters with approximate expression patterns in time polynomial in the size of the time series gene expression matrix. This polynomial time complexity is achieved by manipulating a discretized version of the original matrix using efficient string processing techniques. We also propose extensions to deal with missing values, discover anticorrelated and scaled expression patterns, and different ways to compute the errors allowed in the expression patterns. We propose a scoring criterion combining the statistical significance of expression patterns with a similarity measure between overlapping biclusters.

RESULTS

We present results in real data showing the effectiveness of e-CCC-Biclustering and its relevance in the discovery of regulatory modules describing the transcriptomic expression patterns occurring in Saccharomyces cerevisiae in response to heat stress. In particular, the results show the advantage of considering approximate patterns when compared to state of the art methods that require exact matching of gene expression time series.

DISCUSSION

The identification of co-regulated genes, involved in specific biological processes, remains one of the main avenues open to researchers studying gene regulatory networks. The ability of the proposed methodology to efficiently identify sets of genes with similar expression patterns is shown to be instrumental in the discovery of relevant biological phenomena, leading to more convincing evidence of specific regulatory mechanisms.

AVAILABILITY

A prototype implementation of the algorithm coded in Java together with the dataset and examples used in the paper is available in http://kdbio.inesc-id.pt/software/e-ccc-biclustering.

摘要

背景

通过微阵列实验获得基因表达时间序列，监测表达模式随时间的变化，并观察连贯的时间响应的出现，对于深化我们对复杂生物过程的理解至关重要。在此背景下，双聚类算法已被视为发现局部表达模式的重要工具，而这些局部表达模式对于揭示潜在的调控机制至关重要。尽管双聚类问题的大多数公式都是NP难问题，但在处理时间序列表达数据时，有趣的双聚类可以限制为列连续的双聚类。这种限制导致了一个可处理的问题，并使得能够设计出高效的双聚类算法，能够识别所有最大的列连续连贯双聚类。

方法

在这项工作中，我们提出了e-CCC-双聚类算法，这是一种双聚类算法，它能在时间序列基因表达矩阵大小的时间多项式内找到并报告所有具有近似表达模式的最大列连续连贯双聚类。这种多项式时间复杂度是通过使用高效的字符串处理技术处理原始矩阵的离散化版本来实现的。我们还提出了扩展方法来处理缺失值、发现反相关和缩放后的表达模式，以及计算表达模式中允许的误差的不同方法。我们提出了一种评分标准，将表达模式的统计显著性与重叠双聚类之间的相似性度量相结合。

结果

我们在真实数据中展示了e-CCC-双聚类算法的有效性及其在发现描述酿酒酵母热应激转录组表达模式的调控模块中的相关性。特别是，结果表明与需要精确匹配基因表达时间序列的现有方法相比，考虑近似模式的优势。

讨论

识别参与特定生物过程的共调控基因仍然是研究基因调控网络的研究人员的主要途径之一。所提出的方法有效识别具有相似表达模式的基因集的能力在发现相关生物现象中发挥了作用，为特定调控机制提供了更有说服力的证据。

可用性

该算法的Java原型实现以及论文中使用的数据集和示例可在http://kdbio.inesc-id.pt/software/e-ccc-biclustering获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0d6a/2709627/2f55f4a3b360/1748-7188-4-8-1.jpg

相似文献

A polynomial time biclustering algorithm for finding approximate expression patterns in gene expression time series.

Algorithms Mol Biol. 2009 Jun 4;4:8. doi: 10.1186/1748-7188-4-8.

Identification of regulatory modules in time series gene expression data using a linear time biclustering algorithm.

IEEE/ACM Trans Comput Biol Bioinform. 2010 Jan-Mar;7(1):153-65. doi: 10.1109/TCBB.2008.34.

BiGGEsTS: integrated environment for biclustering analysis of time series gene expression data.

BMC Res Notes. 2009 Jul 7;2:124. doi: 10.1186/1756-0500-2-124.

Identification of coherent patterns in gene expression data using an efficient biclustering algorithm and parallel coordinate visualization.

BMC Bioinformatics. 2008 Apr 23;9:210. doi: 10.1186/1471-2105-9-210.

Discovery of error-tolerant biclusters from noisy gene expression data.

BMC Bioinformatics. 2011 Nov 24;12 Suppl 12(Suppl 12):S1. doi: 10.1186/1471-2105-12-S12-S1.

Identifying gene-specific subgroups: an alternative to biclustering.

BMC Bioinformatics. 2019 Dec 3;20(1):625. doi: 10.1186/s12859-019-3289-0.

Regulatory Snapshots: integrative mining of regulatory modules from expression time series and regulatory networks.

PLoS One. 2012;7(5):e35977. doi: 10.1371/journal.pone.0035977. Epub 2012 May 1.

Discovering biclusters in gene expression data based on high-dimensional linear geometries.

BMC Bioinformatics. 2008 Apr 23;9:209. doi: 10.1186/1471-2105-9-209.

COSCEB: Comprehensive search for column-coherent evolution biclusters and its application to hub gene identification.

J Biosci. 2019 Jun;44(2).

A graph spectrum based geometric biclustering algorithm.

J Theor Biol. 2013 Jan 21;317:200-11. doi: 10.1016/j.jtbi.2012.10.012. Epub 2012 Oct 16.

引用本文的文献

Biclustering data analysis: a comprehensive survey.

Brief Bioinform. 2024 May 23;25(4). doi: 10.1093/bib/bbae342.

RUBic: rapid unsupervised biclustering.

BMC Bioinformatics. 2023 Nov 16;24(1):435. doi: 10.1186/s12859-023-05534-3.

Biclustering fMRI time series: a comparative study.

BMC Bioinformatics. 2022 May 23;23(1):192. doi: 10.1186/s12859-022-04733-8.

Joint clustering with correlated variables.

Am Stat. 2019;73(3):296-306. doi: 10.1080/00031305.2018.1424033. Epub 2018 Jul 9.

It is time to apply biclustering: a comprehensive review of biclustering applications in biological and biomedical data.

Brief Bioinform. 2019 Jul 19;20(4):1449-1464. doi: 10.1093/bib/bby014.

Context Specific and Differential Gene Co-expression Networks via Bayesian Biclustering.

PLoS Comput Biol. 2016 Jul 28;12(7):e1004791. doi: 10.1371/journal.pcbi.1004791. eCollection 2016 Jul.

BiCAMWI: A Genetic-Based Biclustering Algorithm for Detecting Dynamic Protein Complexes.

PLoS One. 2016 Jul 27;11(7):e0159923. doi: 10.1371/journal.pone.0159923. eCollection 2016.

A comparative analysis of biclustering algorithms for gene expression data.

Brief Bioinform. 2013 May;14(3):279-92. doi: 10.1093/bib/bbs032. Epub 2012 Jul 6.

QServer: a biclustering server for prediction and assessment of co-expressed gene clusters.

PLoS One. 2012;7(3):e32660. doi: 10.1371/journal.pone.0032660. Epub 2012 Mar 5.

Transcriptional signatures of regulatory and toxic responses to benzo-[a]-pyrene exposure.

BMC Genomics. 2011 Oct 13;12:502. doi: 10.1186/1471-2164-12-502.

本文引用的文献

Identification of regulatory modules in time series gene expression data using a linear time biclustering algorithm.

IEEE/ACM Trans Comput Biol Bioinform. 2010 Jan-Mar;7(1):153-65. doi: 10.1109/TCBB.2008.34.

A novel non-overlapping bi-clustering algorithm for network generation using living cell array data.

Bioinformatics. 2007 Sep 1;23(17):2306-13. doi: 10.1093/bioinformatics/btm335. Epub 2007 Sep 7.

Analysis of time-series gene expression data: methods, challenges, and opportunities.

Annu Rev Biomed Eng. 2007;9:205-28. doi: 10.1146/annurev.bioeng.9.060906.151904.

Biclustering algorithms for biological data analysis: a survey.

IEEE/ACM Trans Comput Biol Bioinform. 2004 Jan-Mar;1(1):24-45. doi: 10.1109/TCBB.2004.2.

A systematic comparison and evaluation of biclustering methods for gene expression data.

Bioinformatics. 2006 May 1;22(9):1122-9. doi: 10.1093/bioinformatics/btl060. Epub 2006 Feb 24.

Gene Ontology friendly biclustering of expression profiles.

Proc IEEE Comput Syst Bioinform Conf. 2004:436-47.

Biclustering in gene expression data by tendency.

Proc IEEE Comput Syst Bioinform Conf. 2004:182-93. doi: 10.1109/csb.2004.1332431.

The YEASTRACT database: a tool for the analysis of transcription regulatory associations in Saccharomyces cerevisiae.

Nucleic Acids Res. 2006 Jan 1;34(Database issue):D446-51. doi: 10.1093/nar/gkj013.

Gene expression module discovery using gibbs sampling.

Genome Inform. 2004;15(1):239-48.

GOToolBox: functional analysis of gene datasets based on Gene Ontology.

Genome Biol. 2004;5(12):R101. doi: 10.1186/gb-2004-5-12-r101. Epub 2004 Nov 26.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一种用于在基因表达时间序列中寻找近似表达模式的多项式时间双聚类算法。

A polynomial time biclustering algorithm for finding approximate expression patterns in gene expression time series.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

DISCUSSION

AVAILABILITY

背景

方法

结果

讨论

可用性

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献