Suppr超能文献

同时对多个基因表达和物理相互作用数据集进行聚类。

Simultaneous clustering of multiple gene expression and physical interaction datasets.

机构信息

Department of Genetics, Rosetta Inpharmatics Merck, Seattle, Washington, United States of America.

出版信息

PLoS Comput Biol. 2010 Apr 15;6(4):e1000742. doi: 10.1371/journal.pcbi.1000742.

Abstract

Many genome-wide datasets are routinely generated to study different aspects of biological systems, but integrating them to obtain a coherent view of the underlying biology remains a challenge. We propose simultaneous clustering of multiple networks as a framework to integrate large-scale datasets on the interactions among and activities of cellular components. Specifically, we develop an algorithm JointCluster that finds sets of genes that cluster well in multiple networks of interest, such as coexpression networks summarizing correlations among the expression profiles of genes and physical networks describing protein-protein and protein-DNA interactions among genes or gene-products. Our algorithm provides an efficient solution to a well-defined problem of jointly clustering networks, using techniques that permit certain theoretical guarantees on the quality of the detected clustering relative to the optimal clustering. These guarantees coupled with an effective scaling heuristic and the flexibility to handle multiple heterogeneous networks make our method JointCluster an advance over earlier approaches. Simulation results showed JointCluster to be more robust than alternate methods in recovering clusters implanted in networks with high false positive rates. In systematic evaluation of JointCluster and some earlier approaches for combined analysis of the yeast physical network and two gene expression datasets under glucose and ethanol growth conditions, JointCluster discovers clusters that are more consistently enriched for various reference classes capturing different aspects of yeast biology or yield better coverage of the analysed genes. These robust clusters, which are supported across multiple genomic datasets and diverse reference classes, agree with known biology of yeast under these growth conditions, elucidate the genetic control of coordinated transcription, and enable functional predictions for a number of uncharacterized genes.

摘要

许多全基因组数据集通常是为了研究生物系统的不同方面而生成的,但将它们整合起来以获得对基础生物学的整体认识仍然是一个挑战。我们提出了同时对多个网络进行聚类的方法,作为整合大规模数据集的框架,以研究细胞成分之间的相互作用和活性。具体来说,我们开发了一种算法 JointCluster,用于在多个感兴趣的网络中找到聚类效果良好的基因集,例如概括基因表达谱之间相关性的共表达网络,以及描述基因或基因产物之间蛋白质-蛋白质和蛋白质-DNA 相互作用的物理网络。我们的算法为联合聚类网络的明确定义问题提供了一个有效的解决方案,使用了一些技术,可以保证相对于最优聚类,检测到的聚类的质量。这些保证,加上有效的扩展启发式和处理多个异构网络的灵活性,使我们的方法 JointCluster 优于早期的方法。模拟结果表明,在恢复高假阳性率网络中植入的聚类方面,JointCluster 比替代方法更稳健。在对酵母物理网络和两个基因表达数据集在葡萄糖和乙醇生长条件下的联合分析的 JointCluster 和一些早期方法的系统评估中,JointCluster 发现的聚类在各种参考类别的富集程度更高,这些参考类别的捕获了酵母生物学的不同方面,或者更好地覆盖了分析的基因。这些稳健的聚类在多个基因组数据集和不同的参考类中都得到了支持,与这些生长条件下酵母的已知生物学一致,阐明了协调转录的遗传控制,并为许多未表征的基因提供了功能预测。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验