Suppr超能文献

大规模基因共表达网络作为牛基因功能注释的来源

Large-scale gene co-expression network as a source of functional annotation for cattle genes.

作者信息

Beiki Hamid, Nejati-Javaremi Ardeshir, Pakdel Abbas, Masoudi-Nejad Ali, Hu Zhi-Liang, Reecy James M

机构信息

Department of Animal Science, University College of Agriculture and Natural Resources, University of Tehran, Karaj, 31587-11167, Iran.

Department of Animal Science, Iowa State University, Ames, IA, 50011, USA.

出版信息

BMC Genomics. 2016 Nov 2;17(1):846. doi: 10.1186/s12864-016-3176-2.

Abstract

BACKGROUND

Genome sequencing and subsequent gene annotation of genomes has led to the elucidation of many genes, but in vertebrates the actual number of protein coding genes are very consistent across species (~20,000). Seven years after sequencing the cattle genome, there are still genes that have limited annotation and the function of many genes are still not understood, or partly understood at best. Based on the assumption that genes with similar patterns of expression across a vast array of tissues and experimental conditions are likely to encode proteins with related functions or participate within a given pathway, we constructed a genome-wide Cattle Gene Co-expression Network (CGCN) using 72 microarray datasets that contained a total of 1470 Affymetrix Genechip Bovine Genome Arrays that were retrieved from either NCBI GEO or EBI ArrayExpress.

RESULTS

The total of 16,607 probe sets, which represented 11,397 genes, with unique Entrez ID were consolidated into 32 co-expression modules that contained between 29 and 2569 probe sets. All of the identified modules showed strong functional enrichment for gene ontology (GO) terms and Reactome pathways. For example, modules with important biological functions such as response to virus, response to bacteria, energy metabolism, cell signaling and cell cycle have been identified. Moreover, gene co-expression networks using "guilt-by-association" principle have been used to predict the potential function of 132 genes with no functional annotation. Four unknown Hub genes were identified in modules highly enriched for GO terms related to leukocyte activation (LOC509513), RNA processing (LOC100848208), nucleic acid metabolic process (LOC100850151) and organic-acid metabolic process (MGC137211). Such highly connected genes should be investigated more closely as they likely to have key regulatory roles.

CONCLUSIONS

We have demonstrated that the CGCN and its corresponding regulons provides rich information for experimental biologists to design experiments, interpret experimental results, and develop novel hypothesis on gene function in this poorly annotated genome. The network is publicly accessible at http://www.animalgenome.org/cgi-bin/host/reecylab/d .

摘要

背景

基因组测序及随后的基因组基因注释已使许多基因得以阐明,但在脊椎动物中,蛋白质编码基因的实际数量在物种间非常一致(约20000个)。牛基因组测序七年后,仍有一些基因注释有限,许多基因的功能仍未被理解,或至多只是部分被理解。基于这样的假设,即在大量组织和实验条件下具有相似表达模式的基因可能编码具有相关功能的蛋白质或参与特定途径,我们使用72个微阵列数据集构建了全基因组牛基因共表达网络(CGCN),这些数据集总共包含1470个从NCBI GEO或EBI ArrayExpress检索到的Affymetrix Genechip牛基因组阵列。

结果

总共16607个代表11397个具有唯一Entrez ID的基因的探针集被整合到32个共表达模块中,每个模块包含29至2569个探针集。所有鉴定出的模块在基因本体(GO)术语和Reactome途径方面都显示出强烈的功能富集。例如,已鉴定出具有重要生物学功能的模块,如对病毒的反应、对细菌的反应、能量代谢、细胞信号传导和细胞周期。此外,利用“关联有罪”原则的基因共表达网络已被用于预测132个无功能注释基因的潜在功能。在与白细胞激活(LOC509513)、RNA加工(LOC100848208)、核酸代谢过程(LOC100850151)和有机酸代谢过程(MGC137211)相关的GO术语高度富集的模块中鉴定出了四个未知的中心基因。由于这些高度连接的基因可能具有关键的调控作用,因此应更密切地对其进行研究。

结论

我们已经证明,CGCN及其相应的调控子为实验生物学家在这个注释不佳的基因组中设计实验、解释实验结果和提出关于基因功能的新假设提供了丰富的信息。该网络可在http://www.animalgenome.org/cgi-bin/host/reecylab/d上公开获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d8e3/5094014/d2d827d0556a/12864_2016_3176_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验