Suppr超能文献

基于多组学网络的拟南芥未知基因的功能注释。

Multi-omics network-based functional annotation of unknown Arabidopsis genes.

机构信息

Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium.

Center for Plant Systems Biology, Vlaams Instituut voor Biotechnologie, Ghent, Belgium.

出版信息

Plant J. 2021 Nov;108(4):1193-1212. doi: 10.1111/tpj.15507. Epub 2021 Oct 10.

Abstract

Unraveling gene function is pivotal to understanding the signaling cascades that control plant development and stress responses. As experimental profiling is costly and labor intensive, there is a clear need for high-confidence computational annotation. In contrast to detailed gene-specific functional information, transcriptomics data are widely available for both model and crop species. Here, we describe a novel automated function prediction method, which leverages complementary information from multiple expression datasets by analyzing study-specific gene co-expression networks. First, we benchmarked the prediction performance on recently characterized Arabidopsis thaliana genes, and showed that our method outperforms state-of-the-art expression-based approaches. Next, we predicted biological process annotations for known (n = 15 790) and unknown (n = 11 865) genes in A. thaliana and validated our predictions using experimental protein-DNA and protein-protein interaction data (covering >220 000 interactions in total), obtaining a set of high-confidence functional annotations. Our method assigned at least one validated annotation to 5054 (42.6%) unknown genes, and at least one novel validated function to 3408 (53.0%) genes with computational annotations only. These omics-supported functional annotations shed light on a variety of developmental processes and molecular responses, such as flower and root development, defense responses to fungi and bacteria, and phytohormone signaling, and help fill the information gap on biological process annotations in Arabidopsis. An in-depth analysis of two context-specific networks, modeling seed development and response to water deprivation, shows how previously uncharacterized genes function within the respective networks. Moreover, our automated function prediction approach can be applied in future studies to facilitate gene discovery for crop improvement.

摘要

解析基因功能对于理解控制植物发育和应激反应的信号级联至关重要。由于实验分析既昂贵又费力,因此非常需要高可信度的计算注释。与详细的基因特定功能信息相比,转录组学数据在模式生物和作物物种中都广泛可用。在这里,我们描述了一种新颖的自动功能预测方法,该方法通过分析特定研究的基因共表达网络,利用来自多个表达数据集的互补信息。首先,我们在最近表征的拟南芥基因上对预测性能进行了基准测试,并表明我们的方法优于最先进的基于表达的方法。接下来,我们对拟南芥中已知(n=15790)和未知(n=11865)基因进行了生物过程注释预测,并使用实验蛋白质-DNA 和蛋白质-蛋白质相互作用数据(总共涵盖了超过 220000 个相互作用)对预测结果进行了验证,获得了一组高可信度的功能注释。我们的方法至少为 5054 个(42.6%)未知基因分配了一个经过验证的注释,至少为 3408 个(53.0%)仅具有计算注释的基因分配了一个新的经过验证的功能。这些基于组学的功能注释阐明了各种发育过程和分子反应,如花和根发育、对真菌和细菌的防御反应以及植物激素信号转导,并有助于填补拟南芥中生物过程注释的信息空白。对两个特定于上下文的网络(种子发育和对水分胁迫的响应)的深入分析表明了在各自网络中功能尚未明确的基因的作用方式。此外,我们的自动功能预测方法可应用于未来的研究中,以促进作物改良的基因发现。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验