Delcher Arthur L, Bratke Kirsten A, Powers Edwin C, Salzberg Steven L
Center for Bioinformatics & Computational Biology, University of Maryland, College Park, MD 20742, USA.
Bioinformatics. 2007 Mar 15;23(6):673-9. doi: 10.1093/bioinformatics/btm009. Epub 2007 Jan 19.
The Glimmer gene-finding software has been successfully used for finding genes in bacteria, archaea and viruses representing hundreds of species. We describe several major changes to the Glimmer system, including improved methods for identifying both coding regions and start codons. We also describe a new module of Glimmer that can distinguish host and endosymbiont DNA. This module was developed in response to the discovery that eukaryotic genome sequencing projects sometimes inadvertently capture the DNA of intracellular bacteria living in the host.
The new methods dramatically reduce the rate of false-positive predictions, while maintaining Glimmer's 99% sensitivity rate at detecting genes in most species, and they find substantially more correct start sites, as measured by comparisons to known and well-curated genes. We show that our interpolated Markov model (IMM) DNA discriminator correctly separated 99% of the sequences in a recent genome project that produced a mixture of sequences from the bacterium Prochloron didemni and its sea squirt host, Lissoclinum patella.
Glimmer is OSI Certified Open Source and available at http://cbcb.umd.edu/software/glimmer.
Glimmer基因发现软件已成功用于在代表数百个物种的细菌、古细菌和病毒中寻找基因。我们描述了Glimmer系统的几个主要变化,包括识别编码区和起始密码子的改进方法。我们还描述了Glimmer的一个新模块,它可以区分宿主和内共生体DNA。开发这个模块是为了应对这样一个发现,即真核生物基因组测序项目有时会无意中捕获宿主细胞内细菌的DNA。
新方法显著降低了假阳性预测率,同时在大多数物种中检测基因时保持了Glimmer 99%的灵敏度,并且通过与已知且精心整理的基因进行比较,发现了更多正确的起始位点。我们表明,在最近一个基因组项目中,我们的插值马尔可夫模型(IMM)DNA鉴别器正确地分离了99%的序列,该项目产生了来自双鞭毛原绿球藻及其海鞘宿主扁脑珊瑚的混合序列。
Glimmer是经开放源代码促进会(OSI)认证的开源软件,可从http://cbcb.umd.edu/software/glimmer获取。