• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于 DNA 序列改进图形表示的蛋白质编码基因重新注释。

Reannotation of protein-coding genes based on an improved graphical representation of DNA sequence.

机构信息

State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing 210096, People's Republic of China.

出版信息

J Comput Chem. 2010 Aug;31(11):2126-35. doi: 10.1002/jcc.21500.

DOI:10.1002/jcc.21500
PMID:20175214
Abstract

Over annotation of protein coding genes is common phenomenon in microbial genomes, the genome of Amsacta moorei entomopoxvirus (AmEPV) is a typical case, because more than 63% of its annotated ORFs are hypothetical. In this article, we propose an improved graphical representation titled I-TN (improved curve based on trinucleotides) curve, which allows direct inspection of composition and distribution of codons and asymmetric gene structure. This improved graphical representation can also provide convenient tools for genome analysis. From this presentation, 18 variables are exploited as numerical descriptors to represent the specific features of protein coding genes quantitatively, with which we reannotate the protein coding genes in several viral genomes. Using the parameters trained on the experimentally validated genes, all of the 30 experimentally validated genes and 63 putative genes in AmEPV genome are recognized correctly as protein coding, the accuracies of the present method for self-test and cross-validation are 100%, respectively. Twenty-eight annotated hypothetical genes are predicted as noncoding, and then the number of reannotated protein coding genes in AmEPV should be 266 instead of 294 reported in the original annotations. Extending the present method trained in AmEPV to other entomopoxvirus genomes directly, such as Melanoplus sanguinipes entomopoxvirus (MsEPV), all of the 123 annotated function-known and putative genes are recognized correctly as protein coding, and 17 hypothetical genes are recognized as noncoding. The present method could also be extended to other genomes with or without adaptation of training sets with high accuracy.

摘要

在微生物基因组中,蛋白质编码基因的过度注释是一种常见现象,Amsacta moorei 昆虫痘病毒(AmEPV)的基因组就是一个典型的例子,因为其注释的 ORF 中有超过 63%是假设的。在本文中,我们提出了一种改进的图形表示方法,称为 I-TN(基于三核苷酸的改进曲线)曲线,它可以直接检查密码子的组成和分布以及不对称的基因结构。这种改进的图形表示方法也可以为基因组分析提供方便的工具。从这个表示方法中,我们利用了 18 个变量作为数值描述符,对蛋白质编码基因进行定量表示,用这些数值描述符重新注释了几个病毒基因组中的蛋白质编码基因。使用在实验验证基因上训练的参数,AmEPV 基因组中 30 个经过实验验证的基因和 63 个假定基因都被正确地识别为蛋白质编码基因,本方法的自我测试和交叉验证的准确率分别为 100%。28 个注释的假设基因被预测为非编码基因,因此,在原始注释中报告的 AmEPV 中的重新注释的蛋白质编码基因的数量应该是 266 个,而不是 294 个。将在 AmEPV 中训练的本方法扩展到其他昆虫痘病毒基因组,如 Melanoplus sanguinipes 昆虫痘病毒(MsEPV),123 个注释的功能已知和假定基因都被正确地识别为蛋白质编码基因,17 个假设基因被识别为非编码基因。该方法也可以扩展到其他具有或不具有训练集适应性的基因组,具有很高的准确性。

相似文献

1
Reannotation of protein-coding genes based on an improved graphical representation of DNA sequence.基于 DNA 序列改进图形表示的蛋白质编码基因重新注释。
J Comput Chem. 2010 Aug;31(11):2126-35. doi: 10.1002/jcc.21500.
2
Re-prediction of protein-coding genes in the genome of Amsacta moorei entomopoxvirus.摩尔夜蛾昆虫痘病毒基因组中蛋白质编码基因的重新预测
J Virol Methods. 2007 Dec;146(1-2):389-92. doi: 10.1016/j.jviromet.2007.07.010. Epub 2007 Aug 23.
3
Complete genomic sequence of the Amsacta moorei entomopoxvirus: analysis and comparison with other poxviruses.摩尔夜蛾昆虫痘病毒的全基因组序列:与其他痘病毒的分析和比较
Virology. 2000 Aug 15;274(1):120-39. doi: 10.1006/viro.2000.0449.
4
Reannotation of hypothetical ORFs in plant pathogen Erwinia carotovora subsp. atroseptica SCRI1043.植物病原菌胡萝卜软腐欧文氏菌黑腐亚种SCRI1043中假定开放阅读框的重新注释
FEBS J. 2008 Jan;275(1):198-206. doi: 10.1111/j.1742-4658.2007.06190.x. Epub 2007 Dec 7.
5
[Analysis, identification and correction of some errors of model refseqs appeared in NCBI Human Gene Database by in silico cloning and experimental verification of novel human genes].[通过新型人类基因的电子克隆和实验验证对NCBI人类基因数据库中出现的模型参考序列的一些错误进行分析、鉴定和校正]
Yi Chuan Xue Bao. 2004 May;31(5):431-43.
6
Identification of an Amsacta spheroidin-like protein within the occlusion bodies of Choristoneura entomopoxviruses.在云杉卷叶蛾昆虫痘病毒的包涵体内鉴定出一种类无饰夜蛾球形体蛋白。
Virology. 1993 Jan;192(1):179-87. doi: 10.1006/viro.1993.1020.
7
A homolog of the vaccinia virus D13L rifampicin resistance gene is in the entomopoxvirus of the parasitic wasp, Diachasmimorpha longicaudata.柄腹茧蜂的昆虫痘病毒中存在与牛痘病毒 D13L 利福平抗性基因同源的基因。
J Insect Sci. 2008;8:8. doi: 10.1673/031.008.0801.
8
The genome of Melanoplus sanguinipes entomopoxvirus.红腿蝗昆虫痘病毒的基因组
J Virol. 1999 Jan;73(1):533-52. doi: 10.1128/JVI.73.1.533-552.1999.
9
The Melolontha melolontha entomopoxvirus (MmEPV) fusolin is related to the fusolins of lepidopteran EPVs and to the 37K baculovirus glycoprotein.暗黑鳃金龟昆虫痘病毒(MmEPV)融合素与鳞翅目昆虫痘病毒的融合素以及杆状病毒37K糖蛋白相关。
Virology. 1995 Apr 20;208(2):427-36. doi: 10.1006/viro.1995.1173.
10
Gene recognition from questionable ORFs in bacterial and archaeal genomes.从细菌和古细菌基因组中可疑开放阅读框进行基因识别。
J Biomol Struct Dyn. 2003 Aug;21(1):99-109. doi: 10.1080/07391102.2003.10506908.

引用本文的文献

1
Graphical and numerical representations of DNA sequences: statistical aspects of similarity.DNA序列的图形和数值表示:相似性的统计学方面
J Math Chem. 2011;49(10):2345. doi: 10.1007/s10910-011-9890-8. Epub 2011 Aug 28.
2
Re-annotation of protein-coding genes in 10 complete genomes of Neisseriaceae family by combining similarity-based and composition-based methods.通过结合基于相似性和基于组成的方法,重新注释了奈瑟菌科 10 个完整基因组中的蛋白质编码基因。
DNA Res. 2013 Jun;20(3):273-86. doi: 10.1093/dnares/dst009. Epub 2013 Apr 9.
3
Enhancement of crystallization with nucleotide ligands identified by dye-ligand affinity chromatography.
通过染料配体亲和色谱法鉴定的核苷酸配体对结晶的增强作用。
J Struct Funct Genomics. 2012 Jun;13(2):71-9. doi: 10.1007/s10969-012-9124-8.
4
An integrative method for identifying the over-annotated protein-coding genes in microbial genomes.一种用于鉴定微生物基因组中过注释的蛋白编码基因的综合方法。
DNA Res. 2011 Dec;18(6):435-49. doi: 10.1093/dnares/dsr030. Epub 2011 Sep 8.