用于推断全局调控网络的异构全基因组数据集的集成双聚类分析

Integrated biclustering of heterogeneous genome-wide datasets for the inference of global regulatory networks.

作者信息

Reiss David J, Baliga Nitin S, Bonneau Richard

机构信息

Institute for Systems Biology, 1441 N, 34th St, Seattle, WA 98103-8904, USA.

出版信息

BMC Bioinformatics. 2006 Jun 2;7:280. doi: 10.1186/1471-2105-7-280.

DOI:10.1186/1471-2105-7-280

PMID:16749936

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1502140/

Abstract

BACKGROUND

The learning of global genetic regulatory networks from expression data is a severely under-constrained problem that is aided by reducing the dimensionality of the search space by means of clustering genes into putatively co-regulated groups, as opposed to those that are simply co-expressed. Be cause genes may be co-regulated only across a subset of all observed experimental conditions, biclustering (clustering of genes and conditions) is more appropriate than standard clustering. Co-regulated genes are also often functionally (physically, spatially, genetically, and/or evolutionarily) associated, and such a priori known or pre-computed associations can provide support for appropriately grouping genes. One important association is the presence of one or more common cis-regulatory motifs. In organisms where these motifs are not known, their de novo detection, integrated into the clustering algorithm, can help to guide the process towards more biologically parsimonious solutions.

RESULTS

We have developed an algorithm, cMonkey, that detects putative co-regulated gene groupings by integrating the biclustering of gene expression data and various functional associations with the de novo detection of sequence motifs.

CONCLUSION

We have applied this procedure to the archaeon Halobacterium NRC-1, as part of our efforts to decipher its regulatory network. In addition, we used cMonkey on public data for three organisms in the other two domains of life: Helicobacter pylori, Saccharomyces cerevisiae, and Escherichia coli. The biclusters detected by cMonkey both recapitulated known biology and enabled novel predictions (some for Halobacterium were subsequently confirmed in the laboratory). For example, it identified the bacteriorhodopsin regulon, assigned additional genes to this regulon with apparently unrelated function, and detected its known promoter motif. We have performed a thorough comparison of cMonkey results against other clustering methods, and find that cMonkey biclusters are more parsimonious with all available evidence for co-regulation.

摘要

背景

从表达数据中学习全局基因调控网络是一个严重缺乏约束的问题，通过将基因聚类到假定共同调控的组中（与仅仅是共同表达的组相对）来降低搜索空间的维度，有助于解决该问题。由于基因可能仅在所有观察到的实验条件的一个子集中受到共同调控，因此双聚类（基因和条件的聚类）比标准聚类更合适。共同调控的基因通常在功能上（物理、空间、遗传和/或进化方面）也相互关联，这种先验已知或预先计算的关联可以为适当地对基因进行分组提供支持。一个重要的关联是存在一个或多个共同的顺式调控基序。在这些基序未知的生物体中，将其从头检测整合到聚类算法中，可以帮助引导过程朝着更符合生物学简约性的解决方案发展。

结果

我们开发了一种算法cMonkey，它通过整合基因表达数据的双聚类、各种功能关联以及序列基序的从头检测来检测假定的共同调控基因分组。

结论

作为我们破译古菌嗜盐菌NRC - 1调控网络努力的一部分，我们将此程序应用于该古菌。此外，我们还将cMonkey应用于生命其他两个域中三种生物的公共数据：幽门螺杆菌、酿酒酵母和大肠杆菌。cMonkey检测到的双聚类既概括了已知生物学知识，又实现了新的预测（一些关于嗜盐菌的预测随后在实验室中得到证实）。例如，它识别出细菌视紫红质操纵子，为该操纵子分配了功能明显不相关的其他基因，并检测到其已知的启动子基序。我们将cMonkey的结果与其他聚类方法进行了全面比较，发现cMonkey双聚类在所有可用的共同调控证据方面更为简约。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/970b/1502140/332c2254797a/1471-2105-7-280-1.jpg

相似文献

Integrated biclustering of heterogeneous genome-wide datasets for the inference of global regulatory networks.用于推断全局调控网络的异构全基因组数据集的集成双聚类分析

BMC Bioinformatics. 2006 Jun 2;7:280. doi: 10.1186/1471-2105-7-280.

Biclustering via optimal re-ordering of data matrices in systems biology: rigorous methods and comparative studies.系统生物学中通过数据矩阵的最优重排进行双聚类分析：严格方法与比较研究。

BMC Bioinformatics. 2008 Oct 27;9:458. doi: 10.1186/1471-2105-9-458.

Parallelized evolutionary learning for detection of biclusters in gene expression data.并行进化学习在基因表达数据中的双聚类检测。

IEEE/ACM Trans Comput Biol Bioinform. 2012;9(2):560-70. doi: 10.1109/TCBB.2011.53. Epub 2011 Mar 3.

cMonkey2: Automated, systematic, integrated detection of co-regulated gene modules for any organism.cMonkey2：对任何生物体进行共调控基因模块的自动化、系统化、集成化检测。

Nucleic Acids Res. 2015 Jul 27;43(13):e87. doi: 10.1093/nar/gkv300. Epub 2015 Apr 14.

The Inferelator: an algorithm for learning parsimonious regulatory networks from systems-biology data sets de novo.Inferelator：一种用于从头开始从系统生物学数据集中学习简约调控网络的算法。

Genome Biol. 2006;7(5):R36. doi: 10.1186/gb-2006-7-5-r36. Epub 2006 May 10.

Discovering biclusters in gene expression data based on high-dimensional linear geometries.基于高维线性几何在基因表达数据中发现双簇。

BMC Bioinformatics. 2008 Apr 23;9:209. doi: 10.1186/1471-2105-9-209.

Multi-species integrative biclustering.多物种综合二分聚类。

Genome Biol. 2010;11(9):R96. doi: 10.1186/gb-2010-11-9-r96. Epub 2010 Sep 29.

A new geometric biclustering algorithm based on the Hough transform for analysis of large-scale microarray data.一种基于霍夫变换的新型几何双聚类算法，用于大规模微阵列数据分析。

J Theor Biol. 2008 Mar 21;251(2):264-74. doi: 10.1016/j.jtbi.2007.11.030. Epub 2007 Dec 4.

Comparative microbial modules resource: generation and visualization of multi-species biclusters.比较微生物模块资源：多物种双聚类的生成和可视化。

PLoS Comput Biol. 2011 Dec;7(12):e1002228. doi: 10.1371/journal.pcbi.1002228. Epub 2011 Dec 1.

Inferring regulatory elements from a whole genome. An analysis of Helicobacter pylori sigma(80) family of promoter signals.从全基因组推断调控元件。幽门螺杆菌σ80启动子信号家族分析。

J Mol Biol. 2000 Mar 24;297(2):335-53. doi: 10.1006/jmbi.2000.3576.

引用本文的文献

Predicting fitness in with transcriptional regulatory network-informed interpretable machine learning.利用转录调控网络信息可解释机器学习预测适应性。

Front Tuberc. 2025;3. doi: 10.3389/ftubr.2025.1500899. Epub 2025 Apr 2.

Predicting bacterial fitness in Mycobacterium tuberculosis with transcriptional regulatory network-informed interpretable machine learning.利用转录调控网络辅助的可解释机器学习预测结核分枝杆菌中的细菌适应性

bioRxiv. 2024 Sep 25:2024.09.23.614645. doi: 10.1101/2024.09.23.614645.

Biclustering data analysis: a comprehensive survey.双聚类数据分析：全面综述。

Brief Bioinform. 2024 May 23;25(4). doi: 10.1093/bib/bbae342.

Gene regulatory network topology governs resistance and treatment escape in glioma stem-like cells.基因调控网络拓扑结构控制神经胶质瘤干细胞的耐药性和治疗逃逸。

Sci Adv. 2024 Jun 7;10(23):eadj7706. doi: 10.1126/sciadv.adj7706.

Advances in computational and experimental approaches for deciphering transcriptional regulatory networks: Understanding the roles of cis-regulatory elements is essential, and recent research utilizing MPRAs, STARR-seq, CRISPR-Cas9, and machine learning has yielded valuable insights.在解析转录调控网络的计算和实验方法方面的进展：理解顺式调控元件的作用至关重要，最近利用 MPRAs、STARR-seq、CRISPR-Cas9 和机器学习的研究提供了有价值的见解。

Bioessays. 2024 Jul;46(7):e2300210. doi: 10.1002/bies.202300210. Epub 2024 May 8.

MtrA modulates Mycobacterium tuberculosis cell division in host microenvironments to mediate intrinsic resistance and drug tolerance.MtrA 调节结核分枝杆菌在宿主微环境中的细胞分裂，以介导固有耐药性和药物耐受性。

Cell Rep. 2023 Aug 29;42(8):112875. doi: 10.1016/j.celrep.2023.112875. Epub 2023 Aug 4.

ARBic: an all-round biclustering algorithm for analyzing gene expression data.ARBic：一种用于分析基因表达数据的全方位双聚类算法。

NAR Genom Bioinform. 2023 Jan 31;5(1):lqad009. doi: 10.1093/nargab/lqad009. eCollection 2023 Mar.

Role of Disease Progression Models in Drug Development.疾病进展模型在药物研发中的作用。

Pharm Res. 2022 Aug;39(8):1803-1815. doi: 10.1007/s11095-022-03257-3. Epub 2022 Apr 11.

Mathematical models to study the biology of pathogens and the infectious diseases they cause.用于研究病原体生物学及其所致传染病的数学模型。

iScience. 2022 Mar 15;25(4):104079. doi: 10.1016/j.isci.2022.104079. eCollection 2022 Apr 15.

Predictive regulatory and metabolic network models for systems analysis of Clostridioides difficile.用于艰难梭菌系统分析的预测调控和代谢网络模型。

Cell Host Microbe. 2021 Nov 10;29(11):1709-1723.e5. doi: 10.1016/j.chom.2021.09.008. Epub 2021 Oct 11.

本文引用的文献

Optimization by simulated annealing.模拟退火优化。

Science. 1983 May 13;220(4598):671-80. doi: 10.1126/science.220.4598.671.

Biclustering algorithms for biological data analysis: a survey.用于生物数据分析的双聚类算法：一项综述。

IEEE/ACM Trans Comput Biol Bioinform. 2004 Jan-Mar;1(1):24-45. doi: 10.1109/TCBB.2004.2.

Application of simulated annealing to the biclustering of gene expression data.模拟退火算法在基因表达数据双聚类中的应用。

IEEE Trans Inf Technol Biomed. 2006 Jul;10(3):519-25. doi: 10.1109/titb.2006.872073.

A systems view of haloarchaeal strategies to withstand stress from transition metals.嗜盐古菌应对过渡金属压力策略的系统视角。

Genome Res. 2006 Jul;16(7):841-54. doi: 10.1101/gr.5189606. Epub 2006 Jun 2.

Integrative analysis of genome-wide experiments in the context of a large high-throughput data compendium.在大型高通量数据汇编的背景下对全基因组实验进行综合分析。

Mol Syst Biol. 2005;1:2005.0002. doi: 10.1038/msb4100005. Epub 2005 Mar 29.

Genome Biol. 2006;7(5):R36. doi: 10.1186/gb-2006-7-5-r36. Epub 2006 May 10.

The Gaggle: an open-source software system for integrating bioinformatics software and data sources.Gaggle：一个用于整合生物信息学软件和数据源的开源软件系统。

BMC Bioinformatics. 2006 Mar 28;7:176. doi: 10.1186/1471-2105-7-176.

BicAT: a biclustering analysis toolbox.BicAT：一个双聚类分析工具箱。

Bioinformatics. 2006 May 15;22(10):1282-3. doi: 10.1093/bioinformatics/btl099. Epub 2006 Mar 21.

A systematic comparison and evaluation of biclustering methods for gene expression data.基因表达数据双聚类方法的系统比较与评估

Bioinformatics. 2006 May 1;22(9):1122-9. doi: 10.1093/bioinformatics/btl060. Epub 2006 Feb 24.

Comparative gene expression analysis by differential clustering approach: application to the Candida albicans transcription program.采用差异聚类方法进行比较基因表达分析：应用于白色念珠菌转录程序

PLoS Genet. 2005 Sep;1(3):e39. doi: 10.1371/journal.pgen.0010039.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

用于推断全局调控网络的异构全基因组数据集的集成双聚类分析

Integrated biclustering of heterogeneous genome-wide datasets for the inference of global regulatory networks.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献