Suppr超能文献

线粒体基因组中蛋白质编码基因的准确注释。

Accurate annotation of protein-coding genes in mitochondrial genomes.

作者信息

Al Arab Marwa, Höner Zu Siederdissen Christian, Tout Kifah, Sahyoun Abdullah H, Stadler Peter F, Bernt Matthias

机构信息

Bioinformatics Group, Department of Computer Science University of Leipzig, Härtelstraße 16-18, D-04107 Leipzig, Germany; Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstraße 16-18, D-04107 Leipzig, Germany; Doctoral School of Science and Technology, AZM Center for Biotechnology Research, Lebanese University, Tripoli, Lebanon.

Bioinformatics Group, Department of Computer Science University of Leipzig, Härtelstraße 16-18, D-04107 Leipzig, Germany; Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstraße 16-18, D-04107 Leipzig, Germany; Institute for Theoretical Chemistry, University of Vienna, Währingerstraße 17, A-1090 Wien, Austria.

出版信息

Mol Phylogenet Evol. 2017 Jan;106:209-216. doi: 10.1016/j.ympev.2016.09.024. Epub 2016 Sep 28.

Abstract

Mitochondrial genome sequences are available in large number and new sequences become published nowadays with increasing pace. Fast, automatic, consistent, and high quality annotations are a prerequisite for downstream analyses. Therefore, we present an automated pipeline for fast de novo annotation of mitochondrial protein-coding genes. The annotation is based on enhanced phylogeny-aware hidden Markov models (HMMs). The pipeline builds taxon-specific enhanced multiple sequence alignments (MSA) of already annotated sequences and corresponding HMMs using an approximation of the phylogeny. The MSAs are enhanced by fixing unannotated frameshifts, purging of wrong sequences, and removal of non-conserved columns from both ends. A comparison with reference annotations highlights the high quality of the results. The frameshift correction method predicts a large number of frameshifts, many of which are unknown. A detailed analysis of the frameshifts in nad3 of the Archosauria-Testudines group has been conducted.

摘要

线粒体基因组序列数量众多,如今新序列的发表速度也越来越快。快速、自动、一致且高质量的注释是下游分析的前提条件。因此,我们提出了一种用于线粒体蛋白质编码基因快速从头注释的自动化流程。该注释基于增强的系统发育感知隐马尔可夫模型(HMM)。该流程使用系统发育的近似值构建已注释序列和相应HMM的分类群特异性增强多序列比对(MSA)。通过修复未注释的移码、清除错误序列以及从两端去除非保守列来增强MSA。与参考注释的比较突出了结果的高质量。移码校正方法预测了大量移码,其中许多是未知的。对主龙形下纲 - 龟鳖目类群的nad3中的移码进行了详细分析。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验