Suppr超能文献

真菌基因组:饱受功能注释错误之苦。

Fungal genomes: suffering with functional annotation errors.

作者信息

Mohanta Tapan Kumar, Al-Harrasi Ahmed

机构信息

Natural and Medical Sciences Research Center, University of Nizwa, 616, Nizwa, Oman.

出版信息

IMA Fungus. 2021 Nov 1;12(1):32. doi: 10.1186/s43008-021-00083-x.

Abstract

BACKGROUND

The genome sequence data of more than 65985 species are publicly available as of October 2021 within the National Center for Biotechnology Information (NCBI) database alone and additional genome sequences are available in other databases and also continue to accumulate at a rapid pace. However, an error-free functional annotation of these genome is essential for the research communities to fully utilize these data in an optimum and efficient manner.

RESULTS

An analysis of proteome sequence data of 689 fungal species (7.15 million protein sequences) was conducted to identify the presence of functional annotation errors. Proteins associated with calcium signaling events, including calcium dependent protein kinases (CDPKs), calmodulins (CaM), calmodulin-like (CML) proteins, WRKY transcription factors, selenoproteins, and proteins associated with the terpene biosynthesis pathway, were targeted in the analysis. Gene associated with CDPKs and selenoproteins are known to be absent in fungal genomes. Our analysis, however, revealed the presence of proteins that were functionally annotated as CDPK proteins. However, InterproScan analysis indicated that none of the protein sequences annotated as "calcium dependent protein kinase" were found to encode calcium binding EF-hands at the regulatory domain. Similarly, none of a protein sequences annotated as a "selenocysteine" were found to contain a Sec (U) amino acid. Proteins annotated as CaM and CMLs also had significant discrepancies. CaM proteins should contain four calcium binding EF-hands, however, a range of 2-4 calcium binding EF-hands were present in the fungal proteins that were annotated as CaM proteins. Similarly, CMLs should possess four calcium binding EF-hands, but some of the CML annotated fungal proteins possessed either three or four calcium binding EF-hands. WRKY transcription factors are characterized by the presence of a WRKY domain and are confined to the plant kingdom. Several fungal proteins, however, were annotated as WRKY transcription factors, even though they did not contain a WRKY domain.

CONCLUSION

The presence of functional annotation errors in fungal genome and proteome databases is of considerable concern and needs to be addressed in a timely manner.

摘要

背景

截至2021年10月,仅在国家生物技术信息中心(NCBI)数据库中就有超过65985种物种的基因组序列数据可供公开获取,其他数据库中也有额外的基因组序列,并且这些序列还在继续快速积累。然而,对这些基因组进行无错误的功能注释对于研究团体以最佳和高效的方式充分利用这些数据至关重要。

结果

对689种真菌物种的蛋白质组序列数据(715万个蛋白质序列)进行了分析,以确定功能注释错误的存在。分析针对与钙信号事件相关的蛋白质,包括钙依赖性蛋白激酶(CDPKs)、钙调蛋白(CaM)、类钙调蛋白(CML)蛋白、WRKY转录因子、硒蛋白以及与萜类生物合成途径相关的蛋白质。已知真菌基因组中不存在与CDPKs和硒蛋白相关的基因。然而,我们的分析揭示了存在功能上被注释为CDPK蛋白的蛋白质。然而,InterproScan分析表明,在注释为“钙依赖性蛋白激酶”的蛋白质序列中,没有一个在调节域编码钙结合EF手结构。同样,在注释为“硒代半胱氨酸”的蛋白质序列中,没有一个含有Sec(U)氨基酸。注释为CaM和CML的蛋白质也存在显著差异。CaM蛋白应包含四个钙结合EF手结构,然而,注释为CaM蛋白的真菌蛋白中存在2 - 4个钙结合EF手结构。同样,CML应具有四个钙结合EF手结构,但一些注释为CML的真菌蛋白具有三个或四个钙结合EF手结构。WRKY转录因子的特征是存在WRKY结构域,并且局限于植物界。然而,几种真菌蛋白被注释为WRKY转录因子,尽管它们不包含WRKY结构域。

结论

真菌基因组和蛋白质组数据库中功能注释错误的存在令人相当担忧,需要及时解决。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/439f/8559351/22c2bdde3c07/43008_2021_83_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验