• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种用于鉴定微生物基因组中过注释的蛋白编码基因的综合方法。

An integrative method for identifying the over-annotated protein-coding genes in microbial genomes.

机构信息

State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, China.

出版信息

DNA Res. 2011 Dec;18(6):435-49. doi: 10.1093/dnares/dsr030. Epub 2011 Sep 8.

DOI:10.1093/dnares/dsr030
PMID:21903723
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3223076/
Abstract

The falsely annotated protein-coding genes have been deemed one of the major causes accounting for the annotating errors in public databases. Although many filtering approaches have been designed for the over-annotated protein-coding genes, some are questionable due to the resultant increase in false negative. Furthermore, there is no webserver or software specifically devised for the problem of over-annotation. In this study, we propose an integrative algorithm for detecting the over-annotated protein-coding genes in microorganisms. Overall, an average accuracy of 99.94% is achieved over 61 microbial genomes. The extremely high accuracy indicates that the presented algorithm is efficient to differentiate the protein-coding genes from the non-coding open reading frames. Abundant analyses show that the predicting results are reliable and the integrative algorithm is robust and convenient. Our analysis also indicates that the over-annotated protein-coding genes can cause the false positive of horizontal gene transfers detection. The webserver of the proposed algorithm can be freely accessible from www.cbi.seu.edu.cn/RPGM.

摘要

错误注释的蛋白编码基因被认为是导致公共数据库注释错误的主要原因之一。尽管已经设计了许多过滤方法来过滤过度注释的蛋白编码基因,但由于假阴性的增加,其中一些方法存在疑问。此外,目前还没有专门针对过度注释问题设计的网络服务器或软件。在本研究中,我们提出了一种用于检测微生物中过度注释的蛋白编码基因的综合算法。总的来说,在 61 个微生物基因组上的平均准确率达到了 99.94%。极高的准确率表明,所提出的算法能够有效地将蛋白编码基因与非编码开放阅读框区分开来。大量的分析表明,预测结果是可靠的,综合算法是稳健和方便的。我们的分析还表明,过度注释的蛋白编码基因可能导致水平基因转移检测的假阳性。该算法的网络服务器可以从 www.cbi.seu.edu.cn/RPGM 免费访问。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/522c/3223076/2f5fe525f59f/dsr03004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/522c/3223076/4938f4d68c1c/dsr03001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/522c/3223076/a5bd5a6ed118/dsr03002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/522c/3223076/839f7d9d0365/dsr03003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/522c/3223076/2f5fe525f59f/dsr03004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/522c/3223076/4938f4d68c1c/dsr03001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/522c/3223076/a5bd5a6ed118/dsr03002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/522c/3223076/839f7d9d0365/dsr03003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/522c/3223076/2f5fe525f59f/dsr03004.jpg

相似文献

1
An integrative method for identifying the over-annotated protein-coding genes in microbial genomes.一种用于鉴定微生物基因组中过注释的蛋白编码基因的综合方法。
DNA Res. 2011 Dec;18(6):435-49. doi: 10.1093/dnares/dsr030. Epub 2011 Sep 8.
2
ZCURVE: a new system for recognizing protein-coding genes in bacterial and archaeal genomes.Z曲线:一种用于识别细菌和古细菌基因组中蛋白质编码基因的新系统。
Nucleic Acids Res. 2003 Mar 15;31(6):1780-9. doi: 10.1093/nar/gkg254.
3
Gene recognition from questionable ORFs in bacterial and archaeal genomes.从细菌和古细菌基因组中可疑开放阅读框进行基因识别。
J Biomol Struct Dyn. 2003 Aug;21(1):99-109. doi: 10.1080/07391102.2003.10506908.
4
Re-annotation of genome microbial coding-sequences: finding new genes and inaccurately annotated genes.基因组微生物编码序列的重新注释:发现新基因和注释不准确的基因。
BMC Bioinformatics. 2002;3:5. doi: 10.1186/1471-2105-3-5. Epub 2002 Feb 5.
5
PanCoreGen - Profiling, detecting, annotating protein-coding genes in microbial genomes.PanCoreGen——对微生物基因组中的蛋白质编码基因进行分析、检测和注释。
Genomics. 2015 Dec;106(6):367-72. doi: 10.1016/j.ygeno.2015.10.001. Epub 2015 Oct 9.
6
GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions.GeneMarkS:一种用于预测微生物基因组中基因起始位点的自训练方法。对在调控区域中寻找序列基序的启示。
Nucleic Acids Res. 2001 Jun 15;29(12):2607-18. doi: 10.1093/nar/29.12.2607.
7
Evaluating the annotation of protein-coding genes in bacterial genomes: Chloroflexus aurantiacus strain J-10-fl and Natrinema sp J7-2 as case studies.评估细菌基因组中蛋白质编码基因的注释:以嗜热栖热菌J-10-fl菌株和嗜盐嗜碱菌J7-2菌株为例进行研究。
Genet Mol Res. 2014 Dec 19;13(4):10891-7. doi: 10.4238/2014.December.19.10.
8
[Comprehensive re-annotation of protein-coding genes for prokaryotic genomes by Z-curve and similarity-based methods].[基于Z曲线和相似性方法对原核生物基因组蛋白质编码基因进行全面重新注释]
Yi Chuan. 2020 Jul 20;42(7):691-702. doi: 10.16288/j.yczz.20-022.
9
ISsaga is an ensemble of web-based methods for high throughput identification and semi-automatic annotation of insertion sequences in prokaryotic genomes.ISsaga 是一组基于网络的方法,用于高通量鉴定和半自动化注释原核基因组中的插入序列。
Genome Biol. 2011;12(3):R30. doi: 10.1186/gb-2011-12-3-r30. Epub 2011 Mar 28.
10
Identify protein-coding genes in the genomes of Aeropyrum pernix K1 and Chlorobium tepidum TLS.鉴定 Aeropyrum pernix K1 和 Chlorobium tepidum TLS 基因组中的编码蛋白基因。
J Biomol Struct Dyn. 2009 Feb;26(4):413-20. doi: 10.1080/07391102.2009.10507256.

引用本文的文献

1
Translatomics combined with transcriptomics and proteomics reveals novel functional, recently evolved orphan genes in Escherichia coli O157:H7 (EHEC).翻译组学与转录组学和蛋白质组学相结合,揭示了大肠杆菌O157:H7(肠出血性大肠杆菌)中新型功能性、近期进化的孤儿基因。
BMC Genomics. 2016 Feb 24;17:133. doi: 10.1186/s12864-016-2456-1.
2
Recognition of Protein-coding Genes Based on Z-curve Algorithms.基于 Z-曲线算法的蛋白质编码基因识别。
Curr Genomics. 2014 Apr;15(2):95-103. doi: 10.2174/1389202915999140328162724.
3
SearchDOGS bacteria, software that provides automated identification of potentially missed genes in annotated bacterial genomes.

本文引用的文献

1
Using comparative genome analysis to identify problems in annotated microbial genomes.利用比较基因组分析鉴定注释微生物基因组中的问题。
Microbiology (Reading). 2010 Jul;156(Pt 7):1909-1917. doi: 10.1099/mic.0.033811-0. Epub 2010 Apr 29.
2
A novel construction of genome space with biological geometry.具有生物几何结构的基因组空间的新构建。
DNA Res. 2010 Jun;17(3):155-68. doi: 10.1093/dnares/dsq008. Epub 2010 Apr 1.
3
Missing genes in the annotation of prokaryotic genomes.原核生物基因组注释中缺失的基因。
SearchDOGS 细菌,这是一款软件,它提供了对注释细菌基因组中潜在遗漏基因的自动识别。
J Bacteriol. 2014 Jun;196(11):2030-42. doi: 10.1128/JB.01368-13. Epub 2014 Mar 21.
4
Correction of the Caulobacter crescentus NA1000 genome annotation.新月柄杆菌NA1000基因组注释的校正
PLoS One. 2014 Mar 12;9(3):e91668. doi: 10.1371/journal.pone.0091668. eCollection 2014.
5
eCAMBer: efficient support for large-scale comparative analysis of multiple bacterial strains.eCAMBer:支持大规模比较分析多种细菌菌株的高效工具。
BMC Bioinformatics. 2014 Mar 5;15:65. doi: 10.1186/1471-2105-15-65.
6
Re-annotation of protein-coding genes in 10 complete genomes of Neisseriaceae family by combining similarity-based and composition-based methods.通过结合基于相似性和基于组成的方法,重新注释了奈瑟菌科 10 个完整基因组中的蛋白质编码基因。
DNA Res. 2013 Jun;20(3):273-86. doi: 10.1093/dnares/dst009. Epub 2013 Apr 9.
7
ORFcor: identifying and accommodating ORF prediction inconsistencies for phylogenetic analysis.ORFcor:用于系统发育分析的 ORF 预测不一致性的识别和适应。
PLoS One. 2013;8(3):e58387. doi: 10.1371/journal.pone.0058387. Epub 2013 Mar 6.
8
Theoretical prediction and experimental verification of protein-coding genes in plant pathogen genome Agrobacterium tumefaciens strain C58.植物病原菌基因组农杆菌 C58 中蛋白质编码基因的理论预测和实验验证。
PLoS One. 2012;7(9):e43176. doi: 10.1371/journal.pone.0043176. Epub 2012 Sep 11.
BMC Bioinformatics. 2010 Mar 15;11:131. doi: 10.1186/1471-2105-11-131.
4
Reannotation of protein-coding genes based on an improved graphical representation of DNA sequence.基于 DNA 序列改进图形表示的蛋白质编码基因重新注释。
J Comput Chem. 2010 Aug;31(11):2126-35. doi: 10.1002/jcc.21500.
5
Genome reannotation of Escherichia coli CFT073 with new insights into virulence.对具有新毒力见解的大肠杆菌 CFT073 进行基因组重新注释。
BMC Genomics. 2009 Nov 22;10:552. doi: 10.1186/1471-2164-10-552.
6
TN curve: a novel 3D graphical representation of DNA sequence based on trinucleotides and its applications.TN 曲线:一种基于三核苷酸的新型 DNA 序列三维图形表示及其应用。
J Theor Biol. 2009 Dec 7;261(3):459-68. doi: 10.1016/j.jtbi.2009.08.005. Epub 2009 Aug 11.
7
Evaluation of three automated genome annotations for Halorhabdus utahensis.对犹他嗜盐杆菌三种自动基因组注释的评估。
PLoS One. 2009 Jul 20;4(7):e6291. doi: 10.1371/journal.pone.0006291.
8
Identify protein-coding genes in the genomes of Aeropyrum pernix K1 and Chlorobium tepidum TLS.鉴定 Aeropyrum pernix K1 和 Chlorobium tepidum TLS 基因组中的编码蛋白基因。
J Biomol Struct Dyn. 2009 Feb;26(4):413-20. doi: 10.1080/07391102.2009.10507256.
9
Genomics of bacteria and archaea: the emerging dynamic view of the prokaryotic world.细菌和古菌的基因组学:原核生物世界新出现的动态观点。
Nucleic Acids Res. 2008 Dec;36(21):6688-719. doi: 10.1093/nar/gkn668. Epub 2008 Oct 23.
10
Identification and correction of abnormal, incomplete and mispredicted proteins in public databases.公共数据库中异常、不完整和预测错误蛋白质的识别与校正。
BMC Bioinformatics. 2008 Aug 27;9:353. doi: 10.1186/1471-2105-9-353.