Suppr超能文献

用于推断全局调控网络的异构全基因组数据集的集成双聚类分析

Integrated biclustering of heterogeneous genome-wide datasets for the inference of global regulatory networks.

作者信息

Reiss David J, Baliga Nitin S, Bonneau Richard

机构信息

Institute for Systems Biology, 1441 N, 34th St, Seattle, WA 98103-8904, USA.

出版信息

BMC Bioinformatics. 2006 Jun 2;7:280. doi: 10.1186/1471-2105-7-280.

Abstract

BACKGROUND

The learning of global genetic regulatory networks from expression data is a severely under-constrained problem that is aided by reducing the dimensionality of the search space by means of clustering genes into putatively co-regulated groups, as opposed to those that are simply co-expressed. Be cause genes may be co-regulated only across a subset of all observed experimental conditions, biclustering (clustering of genes and conditions) is more appropriate than standard clustering. Co-regulated genes are also often functionally (physically, spatially, genetically, and/or evolutionarily) associated, and such a priori known or pre-computed associations can provide support for appropriately grouping genes. One important association is the presence of one or more common cis-regulatory motifs. In organisms where these motifs are not known, their de novo detection, integrated into the clustering algorithm, can help to guide the process towards more biologically parsimonious solutions.

RESULTS

We have developed an algorithm, cMonkey, that detects putative co-regulated gene groupings by integrating the biclustering of gene expression data and various functional associations with the de novo detection of sequence motifs.

CONCLUSION

We have applied this procedure to the archaeon Halobacterium NRC-1, as part of our efforts to decipher its regulatory network. In addition, we used cMonkey on public data for three organisms in the other two domains of life: Helicobacter pylori, Saccharomyces cerevisiae, and Escherichia coli. The biclusters detected by cMonkey both recapitulated known biology and enabled novel predictions (some for Halobacterium were subsequently confirmed in the laboratory). For example, it identified the bacteriorhodopsin regulon, assigned additional genes to this regulon with apparently unrelated function, and detected its known promoter motif. We have performed a thorough comparison of cMonkey results against other clustering methods, and find that cMonkey biclusters are more parsimonious with all available evidence for co-regulation.

摘要

背景

从表达数据中学习全局基因调控网络是一个严重缺乏约束的问题,通过将基因聚类到假定共同调控的组中(与仅仅是共同表达的组相对)来降低搜索空间的维度,有助于解决该问题。由于基因可能仅在所有观察到的实验条件的一个子集中受到共同调控,因此双聚类(基因和条件的聚类)比标准聚类更合适。共同调控的基因通常在功能上(物理、空间、遗传和/或进化方面)也相互关联,这种先验已知或预先计算的关联可以为适当地对基因进行分组提供支持。一个重要的关联是存在一个或多个共同的顺式调控基序。在这些基序未知的生物体中,将其从头检测整合到聚类算法中,可以帮助引导过程朝着更符合生物学简约性的解决方案发展。

结果

我们开发了一种算法cMonkey,它通过整合基因表达数据的双聚类、各种功能关联以及序列基序的从头检测来检测假定的共同调控基因分组。

结论

作为我们破译古菌嗜盐菌NRC - 1调控网络努力的一部分,我们将此程序应用于该古菌。此外,我们还将cMonkey应用于生命其他两个域中三种生物的公共数据:幽门螺杆菌、酿酒酵母和大肠杆菌。cMonkey检测到的双聚类既概括了已知生物学知识,又实现了新的预测(一些关于嗜盐菌的预测随后在实验室中得到证实)。例如,它识别出细菌视紫红质操纵子,为该操纵子分配了功能明显不相关的其他基因,并检测到其已知的启动子基序。我们将cMonkey的结果与其他聚类方法进行了全面比较,发现cMonkey双聚类在所有可用的共同调控证据方面更为简约。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/970b/1502140/332c2254797a/1471-2105-7-280-1.jpg

相似文献

3
Parallelized evolutionary learning for detection of biclusters in gene expression data.
IEEE/ACM Trans Comput Biol Bioinform. 2012;9(2):560-70. doi: 10.1109/TCBB.2011.53. Epub 2011 Mar 3.
4
cMonkey2: Automated, systematic, integrated detection of co-regulated gene modules for any organism.
Nucleic Acids Res. 2015 Jul 27;43(13):e87. doi: 10.1093/nar/gkv300. Epub 2015 Apr 14.
6
Discovering biclusters in gene expression data based on high-dimensional linear geometries.
BMC Bioinformatics. 2008 Apr 23;9:209. doi: 10.1186/1471-2105-9-209.
7
Multi-species integrative biclustering.
Genome Biol. 2010;11(9):R96. doi: 10.1186/gb-2010-11-9-r96. Epub 2010 Sep 29.
8
A new geometric biclustering algorithm based on the Hough transform for analysis of large-scale microarray data.
J Theor Biol. 2008 Mar 21;251(2):264-74. doi: 10.1016/j.jtbi.2007.11.030. Epub 2007 Dec 4.
9
Comparative microbial modules resource: generation and visualization of multi-species biclusters.
PLoS Comput Biol. 2011 Dec;7(12):e1002228. doi: 10.1371/journal.pcbi.1002228. Epub 2011 Dec 1.

引用本文的文献

1
Predicting fitness in with transcriptional regulatory network-informed interpretable machine learning.
Front Tuberc. 2025;3. doi: 10.3389/ftubr.2025.1500899. Epub 2025 Apr 2.
3
Biclustering data analysis: a comprehensive survey.
Brief Bioinform. 2024 May 23;25(4). doi: 10.1093/bib/bbae342.
4
7
ARBic: an all-round biclustering algorithm for analyzing gene expression data.
NAR Genom Bioinform. 2023 Jan 31;5(1):lqad009. doi: 10.1093/nargab/lqad009. eCollection 2023 Mar.
8
Role of Disease Progression Models in Drug Development.
Pharm Res. 2022 Aug;39(8):1803-1815. doi: 10.1007/s11095-022-03257-3. Epub 2022 Apr 11.
9
Mathematical models to study the biology of pathogens and the infectious diseases they cause.
iScience. 2022 Mar 15;25(4):104079. doi: 10.1016/j.isci.2022.104079. eCollection 2022 Apr 15.
10
Predictive regulatory and metabolic network models for systems analysis of Clostridioides difficile.
Cell Host Microbe. 2021 Nov 10;29(11):1709-1723.e5. doi: 10.1016/j.chom.2021.09.008. Epub 2021 Oct 11.

本文引用的文献

1
Optimization by simulated annealing.
Science. 1983 May 13;220(4598):671-80. doi: 10.1126/science.220.4598.671.
2
Biclustering algorithms for biological data analysis: a survey.
IEEE/ACM Trans Comput Biol Bioinform. 2004 Jan-Mar;1(1):24-45. doi: 10.1109/TCBB.2004.2.
3
Application of simulated annealing to the biclustering of gene expression data.
IEEE Trans Inf Technol Biomed. 2006 Jul;10(3):519-25. doi: 10.1109/titb.2006.872073.
4
A systems view of haloarchaeal strategies to withstand stress from transition metals.
Genome Res. 2006 Jul;16(7):841-54. doi: 10.1101/gr.5189606. Epub 2006 Jun 2.
5
Integrative analysis of genome-wide experiments in the context of a large high-throughput data compendium.
Mol Syst Biol. 2005;1:2005.0002. doi: 10.1038/msb4100005. Epub 2005 Mar 29.
7
The Gaggle: an open-source software system for integrating bioinformatics software and data sources.
BMC Bioinformatics. 2006 Mar 28;7:176. doi: 10.1186/1471-2105-7-176.
8
BicAT: a biclustering analysis toolbox.
Bioinformatics. 2006 May 15;22(10):1282-3. doi: 10.1093/bioinformatics/btl099. Epub 2006 Mar 21.
9
A systematic comparison and evaluation of biclustering methods for gene expression data.
Bioinformatics. 2006 May 1;22(9):1122-9. doi: 10.1093/bioinformatics/btl060. Epub 2006 Feb 24.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验