Suppr超能文献

生物信息学预测来自 11 号染色体的未知蛋白质的基因本体论术语。

Bioinformatic Prediction of Gene Ontology Terms of Uncharacterized Proteins from Chromosome 11.

机构信息

Research Center for Bioconvergence Analysis, Korea Basic Science Institute, Cheongju 28119, Republic of Korea.

Division of Convergence Technology, Research Institute of National Cancer Center, Goyang 10408, Republic of Korea.

出版信息

J Proteome Res. 2020 Dec 4;19(12):4907-4912. doi: 10.1021/acs.jproteome.0c00482. Epub 2020 Oct 22.

Abstract

In chromosome 11, 71 out of its 1254 proteins remain functionally uncharacterized on the basis of their existence evidence (uPE1s) following the latest version of neXtProt (release 2020-01-17). Because and experimental strategies are often time-consuming and labor-intensive, there is a need for a bioinformatics tool to predict the function annotation. Here, we used I-TASSER/COFACTOR provided on the neXtProt web site, which predicts gene ontology (GO) terms based on the 3D structure of the protein. I-TASSER/COFACTOR predicted 2413 GO terms with a benchmark dataset of the 22 proteins belonging to PE1 of chromosome 11. In this study, we developed a filtering algorithm in order to select specific GO terms using the GO map generated by I-TASSER/COFACTOR. As a result, 187 specific GO terms showed a higher average precision-recall score at the least cellular component term compared to 2413 predicted GO terms. Next, we applied 65 proteins belonging to uPE1s of chromosome 11, and then 409 out of 6684 GO terms survived, where 103 and 142 GO terms of molecular function and biological process, respectively, were included. Representatively, the cellular component GO terms of CCDC90B, C11orf52, and the SMAP were predicted and validated using the overexpression system into 293T cells and immunofluorescence staining. We will further study their biological and molecular functions toward the goal of the neXt-CP50 project as a part of C-HPP. We shared all results and programs in Github (https://github.com/heeyounh/I-TASSER-COFACTOR-filtering.git).

摘要

在最新版本的 neXtProt(发布于 2020-01-17)中,11 号染色体的 1254 种蛋白质中,有 71 种蛋白质仅基于其存在证据(uPE1s)而具有功能,尚未得到明确。由于 和 实验策略通常既费时又费力,因此需要一种生物信息学工具来预测功能注释。在这里,我们使用了 neXtProt 网站上提供的 I-TASSER/COFACTOR,它基于蛋白质的 3D 结构预测基因本体 (GO) 术语。I-TASSER/COFACTOR 使用包含 11 号染色体 PE1 蛋白的 22 个蛋白的基准数据集,预测了 2413 个 GO 术语。在这项研究中,我们开发了一种过滤算法,以便使用 I-TASSER/COFACTOR 生成的 GO 图谱选择特定的 GO 术语。结果,与 2413 个预测的 GO 术语相比,187 个特定的 GO 术语在最低细胞成分术语上表现出更高的平均精度-召回分数。接下来,我们应用了属于 11 号染色体 uPE1s 的 65 个蛋白,其中 6684 个 GO 术语中有 409 个幸存,分别包括分子功能和生物过程的 103 和 142 个 GO 术语。代表性地,通过将 CCDC90B、C11orf52 和 SMAP 过表达到 293T 细胞中并进行免疫荧光染色,对它们的细胞成分 GO 术语进行了预测和验证。我们将进一步研究它们的生物学和分子功能,以实现 C-HPP 项目的 neXt-CP50 目标。我们在 Github(https://github.com/heeyounh/I-TASSER-COFACTOR-filtering.git)上共享了所有结果和程序。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验