Parallel Computing and Complex Systems Group, Department of Computer Science, University Leipzig, Augustusplatz 10-11, 04109 Leipzig, Germany.
Mol Phylogenet Evol. 2013 Nov;69(2):313-9. doi: 10.1016/j.ympev.2012.08.023. Epub 2012 Sep 7.
About 2000 completely sequenced mitochondrial genomes are available from the NCBI RefSeq data base together with manually curated annotations of their protein-coding genes, rRNAs, and tRNAs. This annotation information, which has accumulated over two decades, has been obtained with a diverse set of computational tools and annotation strategies. Despite all efforts of manual curation it is still plagued by misassignments of reading directions, erroneous gene names, and missing as well as false positive annotations in particular for the RNA genes. Taken together, this causes substantial problems for fully automatic pipelines that aim to use these data comprehensively for studies of animal phylogenetics and the molecular evolution of mitogenomes. The MITOS pipeline is designed to compute a consistent de novo annotation of the mitogenomic sequences. We show that the results of MITOS match RefSeq and MitoZoa in terms of annotation coverage and quality. At the same time we avoid biases, inconsistencies of nomenclature, and typos originating from manual curation strategies. The MITOS pipeline is accessible online at http://mitos.bioinf.uni-leipzig.de.
大约 2000 个完全测序的线粒体基因组可从 NCBI RefSeq 数据库中获得,这些基因组都附有其蛋白质编码基因、rRNA 和 tRNA 的人工注释。这些注释信息已经积累了二十多年,是使用各种不同的计算工具和注释策略获得的。尽管经过了人工校对,但仍然存在阅读方向错误、基因名称错误、RNA 基因缺失和假阳性注释等问题。所有这些问题都会给旨在全面利用这些数据进行动物系统发育和线粒体基因组分子进化研究的全自动分析流水线带来很大的问题。MITOS 分析流水线是专为计算线粒体基因组序列的一致从头注释而设计的。我们证明,MITOS 的结果在注释覆盖率和质量方面与 RefSeq 和 MitoZoa 是一致的。同时,我们避免了由于人工注释策略而产生的命名偏见、命名不一致和错别字。MITOS 分析流水线可在 http://mitos.bioinf.uni-leipzig.de 上在线访问。