Suppr超能文献

检测病毒基因组中的重叠编码序列。

Detecting overlapping coding sequences in virus genomes.

作者信息

Firth Andrew E, Brown Chris M

机构信息

Department of Biochemistry, University of Otago, PO Box 56, Dunedin, New Zealand.

出版信息

BMC Bioinformatics. 2006 Feb 16;7:75. doi: 10.1186/1471-2105-7-75.

Abstract

BACKGROUND

Detecting new coding sequences (CDSs) in viral genomes can be difficult for several reasons. The typically compact genomes often contain a number of overlapping coding and non-coding functional elements, which can result in unusual patterns of codon usage; conservation between related sequences can be difficult to interpret--especially within overlapping genes; and viruses often employ non-canonical translational mechanisms--e.g. frameshifting, stop codon read-through, leaky-scanning and internal ribosome entry sites--which can conceal potentially coding open reading frames (ORFs).

RESULTS

In a previous paper we introduced a new statistic--MLOGD (Maximum Likelihood Overlapping Gene Detector)--for detecting and analysing overlapping CDSs. Here we present (a) an improved MLOGD statistic, (b) a greatly extended suite of software using MLOGD, (c) a database of results for 640 virus sequence alignments, and (d) a web-interface to the software and database. Tests show that, from an alignment with just 20 mutations, MLOGD can discriminate non-overlapping CDSs from non-coding ORFs with a typical accuracy of up to 98%, and can detect CDSs overlapping known CDSs with a typical accuracy of 90%. In addition, the software produces a variety of statistics and graphics, useful for analysing an input multiple sequence alignment.

CONCLUSION

MLOGD is an easy-to-use tool for virus genome annotation, detecting new CDSs--in particular overlapping or short CDSs--and for analysing overlapping CDSs following frameshift sites. The software, web-server, database and supplementary material are available at http://guinevere.otago.ac.nz/mlogd.html.

摘要

背景

检测病毒基因组中的新编码序列(CDS)可能因多种原因而变得困难。病毒基因组通常较为紧凑,常常包含许多重叠的编码和非编码功能元件,这可能导致密码子使用模式异常;相关序列之间的保守性可能难以解读,尤其是在重叠基因内部;而且病毒常常采用非经典的翻译机制,例如移码、终止密码子通读、渗漏扫描和内部核糖体进入位点,这些机制可能会隐藏潜在的编码开放阅读框(ORF)。

结果

在之前的一篇论文中,我们引入了一种新的统计方法——最大似然重叠基因检测器(MLOGD),用于检测和分析重叠的CDS。在此,我们展示:(a)一种改进的MLOGD统计方法;(b)一套使用MLOGD的大幅扩展的软件;(c)一个包含640个病毒序列比对结果的数据库;以及(d)该软件和数据库的网络界面。测试表明,从仅有20个突变的比对中,MLOGD能够以高达98%的典型准确率区分非重叠CDS和非编码ORF,并能够以90%的典型准确率检测与已知CDS重叠的CDS。此外,该软件还能生成各种统计数据和图表,有助于分析输入的多序列比对。

结论

MLOGD是一种易于使用的工具,可用于病毒基因组注释、检测新的CDS,特别是重叠或短的CDS,以及分析移码位点后的重叠CDS。该软件、网络服务器、数据库及补充材料可在http://guinevere.otago.ac.nz/mlogd.html获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/930b/1395342/86128e03d18a/1471-2105-7-75-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验