Suppr超能文献

利用基因本体论和KEGG通路的富集对必需基因进行预测和分析。

Prediction and analysis of essential genes using the enrichments of gene ontology and KEGG pathways.

作者信息

Chen Lei, Zhang Yu-Hang, Wang ShaoPeng, Zhang YunHua, Huang Tao, Cai Yu-Dong

机构信息

School of Life Sciences, Shanghai University, Shanghai, People's Republic of China.

College of Information Engineering, Shanghai Maritime University, Shanghai, People's Republic of China.

出版信息

PLoS One. 2017 Sep 5;12(9):e0184129. doi: 10.1371/journal.pone.0184129. eCollection 2017.

Abstract

Identifying essential genes in a given organism is important for research on their fundamental roles in organism survival. Furthermore, if possible, uncovering the links between core functions or pathways with these essential genes will further help us obtain deep insight into the key roles of these genes. In this study, we investigated the essential and non-essential genes reported in a previous study and extracted gene ontology (GO) terms and biological pathways that are important for the determination of essential genes. Through the enrichment theory of GO and KEGG pathways, we encoded each essential/non-essential gene into a vector in which each component represented the relationship between the gene and one GO term or KEGG pathway. To analyze these relationships, the maximum relevance minimum redundancy (mRMR) was adopted. Then, the incremental feature selection (IFS) and support vector machine (SVM) were employed to extract important GO terms and KEGG pathways. A prediction model was built simultaneously using the extracted GO terms and KEGG pathways, which yielded nearly perfect performance, with a Matthews correlation coefficient of 0.951, for distinguishing essential and non-essential genes. To fully investigate the key factors influencing the fundamental roles of essential genes, the 21 most important GO terms and three KEGG pathways were analyzed in detail. In addition, several genes was provided in this study, which were predicted to be essential genes by our prediction model. We suggest that this study provides more functional and pathway information on the essential genes and provides a new way to investigate related problems.

摘要

识别特定生物体中的必需基因对于研究它们在生物体生存中的基本作用非常重要。此外,如果可能的话,揭示这些必需基因与核心功能或途径之间的联系将进一步帮助我们深入了解这些基因的关键作用。在本研究中,我们调查了先前研究中报道的必需基因和非必需基因,并提取了对于确定必需基因很重要的基因本体(GO)术语和生物途径。通过GO和KEGG途径的富集理论,我们将每个必需/非必需基因编码为一个向量,其中每个分量代表该基因与一个GO术语或KEGG途径之间的关系。为了分析这些关系,采用了最大相关最小冗余(mRMR)方法。然后,使用增量特征选择(IFS)和支持向量机(SVM)来提取重要的GO术语和KEGG途径。同时利用提取的GO术语和KEGG途径构建了一个预测模型,该模型在区分必需基因和非必需基因方面表现近乎完美,马修斯相关系数为0.951。为了全面研究影响必需基因基本作用的关键因素,详细分析了21个最重要的GO术语和三条KEGG途径。此外,本研究还提供了几个基因,它们被我们的预测模型预测为必需基因。我们认为这项研究提供了更多关于必需基因的功能和途径信息,并为研究相关问题提供了一种新方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/886e/5584762/29394a0f9d28/pone.0184129.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验