Suppr超能文献

使用MFannot进行线粒体基因组注释:基因识别与基因模型预测的批判性分析

Mitochondrial genome annotation with MFannot: a critical analysis of gene identification and gene model prediction.

作者信息

Lang B Franz, Beck Natacha, Prince Samuel, Sarrasin Matt, Rioux Pierre, Burger Gertraud

机构信息

Robert Cedergren Center for Bioinformatics and Genomics, Département de Biochimie, Université de Montréal, Montréal, QC, Canada.

出版信息

Front Plant Sci. 2023 Jul 4;14:1222186. doi: 10.3389/fpls.2023.1222186. eCollection 2023.

Abstract

Compared to nuclear genomes, mitochondrial genomes (mitogenomes) are small and usually code for only a few dozen genes. Still, identifying genes and their structure can be challenging and time-consuming. Even automated tools for mitochondrial genome annotation often require manual analysis and curation by skilled experts. The most difficult steps are (i) the structural modelling of intron-containing genes; (ii) the identification and delineation of Group I and II introns; and (iii) the identification of moderately conserved, non-coding RNA (ncRNA) genes specifying 5S rRNAs, tmRNAs and RNase P RNAs. Additional challenges arise through genetic code evolution which can redefine the translational identity of both start and stop codons, thus obscuring protein-coding genes. Further, RNA editing can render gene identification difficult, if not impossible, without additional RNA sequence data. Current automated mito- and plastid-genome annotators are limited as they are typically tailored to specific eukaryotic groups. The MFannot annotator we developed is unique in its applicability to a broad taxonomic scope, its accuracy in gene model inference, and its capabilities in intron identification and classification. The pipeline leverages curated profile Hidden Markov Models (HMMs), covariance (CMs) and ERPIN models to better capture evolutionarily conserved signatures in the primary sequence (HMMs and CMs) as well as secondary structure (CMs and ERPIN). Here we formally describe MFannot, which has been available as a web-accessible service (https://megasun.bch.umontreal.ca/apps/mfannot/) to the research community for nearly 16 years. Further, we report its performance on particularly intron-rich mitogenomes and describe ongoing and future developments.

摘要

与核基因组相比,线粒体基因组(线粒体基因组)较小,通常仅编码几十个基因。尽管如此,识别基因及其结构可能具有挑战性且耗时。即使是用于线粒体基因组注释的自动化工具,通常也需要由技术熟练的专家进行人工分析和整理。最困难的步骤包括:(i)含内含子基因的结构建模;(ii)I类和II类内含子的识别和界定;(iii)识别指定5S rRNA、tmRNA和核糖核酸酶P RNA的中度保守非编码RNA(ncRNA)基因。遗传密码进化会带来额外的挑战,它可以重新定义起始密码子和终止密码子的翻译身份,从而模糊蛋白质编码基因。此外,如果没有额外的RNA序列数据,RNA编辑会使基因识别变得困难,甚至无法识别。当前的线粒体和质体基因组自动注释器存在局限性,因为它们通常是针对特定真核生物群体定制的。我们开发的MFannot注释器在广泛的分类学范围内具有适用性、在基因模型推断方面具有准确性以及在内含子识别和分类方面具有能力,这些方面都是独一无二的。该流程利用经过整理的轮廓隐马尔可夫模型(HMM)、协方差模型(CM)和ERPIN模型,以更好地捕捉一级序列(HMM和CM)以及二级结构(CM和ERPIN)中进化保守的特征。在这里,我们正式描述MFannot,它作为一项可通过网络访问的服务(https://megasun.bch.umontreal.ca/apps/mfannot/)已向研究界提供了近16年。此外,我们报告了它在特别富含内含子的线粒体基因组上的性能,并描述了正在进行的和未来的发展。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3938/10352661/4c11c27e8f01/fpls-14-1222186-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验