使用线性时间二分聚类算法识别时间序列基因表达数据中的调控模块。

Identification of regulatory modules in time series gene expression data using a linear time biclustering algorithm.

机构信息

Universidade da Beira Interior, Covilhã, KDBIO Group, INESC-ID, Lisbon, Portugal.

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2010 Jan-Mar;7(1):153-65. doi: 10.1109/TCBB.2008.34.

DOI:10.1109/TCBB.2008.34

PMID:20150677

Abstract

Although most biclustering formulations are NP-hard, in time series expression data analysis, it is reasonable to restrict the problem to the identification of maximal biclusters with contiguous columns, which correspond to coherent expression patterns shared by a group of genes in consecutive time points. This restriction leads to a tractable problem. We propose an algorithm that finds and reports all maximal contiguous column coherent biclusters in time linear in the size of the expression matrix. The linear time complexity of CCC-Biclustering relies on the use of a discretized matrix and efficient string processing techniques based on suffix trees. We also propose a method for ranking biclusters based on their statistical significance and a methodology for filtering highly overlapping and, therefore, redundant biclusters. We report results in synthetic and real data showing the effectiveness of the approach and its relevance in the discovery of regulatory modules. Results obtained using the transcriptomic expression patterns occurring in Saccharomyces cerevisiae in response to heat stress show not only the ability of the proposed methodology to extract relevant information compatible with documented biological knowledge but also the utility of using this algorithm in the study of other environmental stresses and of regulatory modules in general.

摘要

虽然大多数双聚类公式都是 NP 难的，但在时间序列表达数据分析中，将问题限制为识别具有连续列的最大双聚类是合理的，这对应于一组基因在连续时间点上共享的连贯表达模式。这种限制导致了一个可解的问题。我们提出了一种算法，该算法可以在线性时间内找到并报告所有最大的连续列相干双聚类，其大小与表达矩阵的大小成正比。CCC-Biclustering 的线性时间复杂度依赖于离散化矩阵和基于后缀树的高效字符串处理技术的使用。我们还提出了一种基于统计显著性对双聚类进行排序的方法，以及一种过滤高度重叠和因此冗余的双聚类的方法。我们在合成数据和真实数据中报告了结果，展示了该方法的有效性及其在发现调控模块方面的相关性。使用酿酒酵母在热应激下发生的转录组表达模式获得的结果不仅表明了所提出的方法能够提取与已有生物学知识兼容的相关信息的能力，还表明了在研究其他环境应激和一般调控模块时使用该算法的实用性。

相似文献

Identification of regulatory modules in time series gene expression data using a linear time biclustering algorithm.使用线性时间二分聚类算法识别时间序列基因表达数据中的调控模块。

IEEE/ACM Trans Comput Biol Bioinform. 2010 Jan-Mar;7(1):153-65. doi: 10.1109/TCBB.2008.34.

A polynomial time biclustering algorithm for finding approximate expression patterns in gene expression time series.一种用于在基因表达时间序列中寻找近似表达模式的多项式时间双聚类算法。

Algorithms Mol Biol. 2009 Jun 4;4:8. doi: 10.1186/1748-7188-4-8.

Gene expression data analysis using a novel approach to biclustering combining discrete and continuous data.使用一种结合离散数据和连续数据的新型双聚类方法进行基因表达数据分析。

IEEE/ACM Trans Comput Biol Bioinform. 2008 Oct-Dec;5(4):583-93. doi: 10.1109/TCBB.2007.70251.

EDISA: extracting biclusters from multiple time-series of gene expression profiles.EDISA：从多个基因表达谱时间序列中提取双聚类

BMC Bioinformatics. 2007 Sep 12;8:334. doi: 10.1186/1471-2105-8-334.

Identification of coherent patterns in gene expression data using an efficient biclustering algorithm and parallel coordinate visualization.使用高效双聚类算法和并行坐标可视化技术识别基因表达数据中的连贯模式。

BMC Bioinformatics. 2008 Apr 23;9:210. doi: 10.1186/1471-2105-9-210.

LateBiclustering: Efficient Heuristic Algorithm for Time-Lagged Bicluster Identification.晚期双聚类：用于识别时间滞后双聚类的高效启发式算法

IEEE/ACM Trans Comput Biol Bioinform. 2014 Sep-Oct;11(5):801-13. doi: 10.1109/TCBB.2014.2312007.

Parallelized evolutionary learning for detection of biclusters in gene expression data.并行进化学习在基因表达数据中的双聚类检测。

IEEE/ACM Trans Comput Biol Bioinform. 2012;9(2):560-70. doi: 10.1109/TCBB.2011.53. Epub 2011 Mar 3.

Query-driven module discovery in microarray data.微阵列数据中查询驱动的模块发现

Bioinformatics. 2007 Oct 1;23(19):2573-80. doi: 10.1093/bioinformatics/btm387. Epub 2007 Aug 8.

Discovery of error-tolerant biclusters from noisy gene expression data.从嘈杂的基因表达数据中发现容错双聚类。

BMC Bioinformatics. 2011 Nov 24;12 Suppl 12(Suppl 12):S1. doi: 10.1186/1471-2105-12-S12-S1.

A new geometric biclustering algorithm based on the Hough transform for analysis of large-scale microarray data.一种基于霍夫变换的新型几何双聚类算法，用于大规模微阵列数据分析。

J Theor Biol. 2008 Mar 21;251(2):264-74. doi: 10.1016/j.jtbi.2007.11.030. Epub 2007 Dec 4.

引用本文的文献

Biclustering data analysis: a comprehensive survey.双聚类数据分析：全面综述。

Brief Bioinform. 2024 May 23;25(4). doi: 10.1093/bib/bbae342.

Biclustering of Log Data: Insights from a Computer-Based Complex Problem Solving Assessment.日志数据的双聚类分析：基于计算机的复杂问题解决评估的见解

J Intell. 2024 Jan 17;12(1):10. doi: 10.3390/jintelligence12010010.

Triclustering-based classification of longitudinal data for prognostic prediction: targeting relevant clinical endpoints in amyotrophic lateral sclerosis.基于三聚类的纵向数据分析分类用于预后预测：以肌萎缩侧索硬化症的相关临床终点为目标。

Sci Rep. 2023 Apr 15;13(1):6182. doi: 10.1038/s41598-023-33223-x.

Gene differential co-expression analysis of male infertility patients based on statistical and machine learning methods.基于统计和机器学习方法的男性不育患者基因差异共表达分析

Front Microbiol. 2023 Jan 27;14:1092143. doi: 10.3389/fmicb.2023.1092143. eCollection 2023.

Biclustering of medical monitoring data using a nonparametric hierarchical Bayesian model.使用非参数层次贝叶斯模型对医学监测数据进行双聚类分析。

Stat (Int Stat Inst). 2020;9(1). doi: 10.1002/sta4.279. Epub 2020 Mar 15.

Biclustering fMRI time series: a comparative study.基于功能磁共振成像时间序列的双聚类分析：一项对比研究。

BMC Bioinformatics. 2022 May 23;23(1):192. doi: 10.1186/s12859-022-04733-8.

Efficient Approximation of Statistical Significance in Local Trend Analysis of Dependent Time Series.相依时间序列局部趋势分析中统计显著性的有效近似

Front Genet. 2022 Apr 26;13:729011. doi: 10.3389/fgene.2022.729011. eCollection 2022.

A Multi-Level Iterative Bi-Clustering Method for Discovering miRNA Co-regulation Network of Abiotic Stress Tolerance in Soybeans.一种用于发现大豆非生物胁迫耐受性的miRNA共调控网络的多层次迭代双聚类方法。

Front Plant Sci. 2022 Apr 7;13:860791. doi: 10.3389/fpls.2022.860791. eCollection 2022.

G-Tric: generating three-way synthetic datasets with triclustering solutions.G-Tric：使用三聚类解决方案生成三路合成数据集。

BMC Bioinformatics. 2021 Jan 7;22(1):16. doi: 10.1186/s12859-020-03925-4.

Statistical significance approximation for local similarity analysis of dependent time series data.相依时间序列数据局部相似性分析的统计显著性逼近。

BMC Bioinformatics. 2019 Jan 28;20(1):53. doi: 10.1186/s12859-019-2595-x.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

使用线性时间二分聚类算法识别时间序列基因表达数据中的调控模块。

Identification of regulatory modules in time series gene expression data using a linear time biclustering algorithm.

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献