Suppr超能文献

多研究推断调控网络,以更准确地构建基因调控模型。

Multi-study inference of regulatory networks for more accurate models of gene regulation.

机构信息

New York University, New York, NY 10003, USA.

Center for Computational Biology, Flatiron Institute, New York, NY 10010, USA.

出版信息

PLoS Comput Biol. 2019 Jan 24;15(1):e1006591. doi: 10.1371/journal.pcbi.1006591. eCollection 2019 Jan.

Abstract

Gene regulatory networks are composed of sub-networks that are often shared across biological processes, cell-types, and organisms. Leveraging multiple sources of information, such as publicly available gene expression datasets, could therefore be helpful when learning a network of interest. Integrating data across different studies, however, raises numerous technical concerns. Hence, a common approach in network inference, and broadly in genomics research, is to separately learn models from each dataset and combine the results. Individual models, however, often suffer from under-sampling, poor generalization and limited network recovery. In this study, we explore previous integration strategies, such as batch-correction and model ensembles, and introduce a new multitask learning approach for joint network inference across several datasets. Our method initially estimates the activities of transcription factors, and subsequently, infers the relevant network topology. As regulatory interactions are context-dependent, we estimate model coefficients as a combination of both dataset-specific and conserved components. In addition, adaptive penalties may be used to favor models that include interactions derived from multiple sources of prior knowledge including orthogonal genomics experiments. We evaluate generalization and network recovery using examples from Bacillus subtilis and Saccharomyces cerevisiae, and show that sharing information across models improves network reconstruction. Finally, we demonstrate robustness to both false positives in the prior information and heterogeneity among datasets.

摘要

基因调控网络由通常在生物过程、细胞类型和生物体中共享的子网组成。因此,在学习感兴趣的网络时,利用多个信息源,如公开的基因表达数据集,可能会有所帮助。然而,整合来自不同研究的数据会引发许多技术问题。因此,在网络推断中,以及在广义的基因组学研究中,一种常见的方法是分别从每个数据集学习模型并组合结果。然而,单个模型经常存在采样不足、泛化能力差和网络恢复有限的问题。在这项研究中,我们探索了先前的整合策略,如批量校正和模型集成,并引入了一种新的多任务学习方法,用于跨多个数据集进行联合网络推断。我们的方法最初估计转录因子的活性,然后推断相关的网络拓扑结构。由于调节相互作用是上下文相关的,我们将模型系数估计为数据集特定和保守成分的组合。此外,自适应惩罚可用于支持包括来自多个先前知识来源(包括正交基因组实验)的相互作用的模型,这些来源的信息被整合在一起。我们使用枯草芽孢杆菌和酿酒酵母的示例来评估泛化和网络恢复情况,并表明跨模型共享信息可以改善网络重建。最后,我们证明了对先验信息中的假阳性和数据集之间的异质性都具有鲁棒性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f39f/6363223/a1b15a5fff22/pcbi.1006591.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验