Suppr超能文献

MitoHiFi:一个从 PacBio 高保真reads 组装线粒体基因组的 Python 分析流程

MitoHiFi: a python pipeline for mitochondrial genome assembly from PacBio high fidelity reads.

机构信息

Tree of Life, Wellcome Sanger Institute, Cambridge, CB10 1SA, UK.

Bio Bureau Biotecnologia, Rio de Janeiro, Brazil.

出版信息

BMC Bioinformatics. 2023 Jul 18;24(1):288. doi: 10.1186/s12859-023-05385-y.

Abstract

BACKGROUND

PacBio high fidelity (HiFi) sequencing reads are both long (15-20 kb) and highly accurate (> Q20). Because of these properties, they have revolutionised genome assembly leading to more accurate and contiguous genomes. In eukaryotes the mitochondrial genome is sequenced alongside the nuclear genome often at very high coverage. A dedicated tool for mitochondrial genome assembly using HiFi reads is still missing.

RESULTS

MitoHiFi was developed within the Darwin Tree of Life Project to assemble mitochondrial genomes from the HiFi reads generated for target species. The input for MitoHiFi is either the raw reads or the assembled contigs, and the tool outputs a mitochondrial genome sequence fasta file along with annotation of protein and RNA genes. Variants arising from heteroplasmy are assembled independently, and nuclear insertions of mitochondrial sequences are identified and not used in organellar genome assembly. MitoHiFi has been used to assemble 374 mitochondrial genomes (368 Metazoa and 6 Fungi species) for the Darwin Tree of Life Project, the Vertebrate Genomes Project and the Aquatic Symbiosis Genome Project. Inspection of 60 mitochondrial genomes assembled with MitoHiFi for species that already have reference sequences in public databases showed the widespread presence of previously unreported repeats.

CONCLUSIONS

MitoHiFi is able to assemble mitochondrial genomes from a wide phylogenetic range of taxa from Pacbio HiFi data. MitoHiFi is written in python and is freely available on GitHub ( https://github.com/marcelauliano/MitoHiFi ). MitoHiFi is available with its dependencies as a Docker container on GitHub (ghcr.io/marcelauliano/mitohifi:master).

摘要

背景

PacBio 高保真(HiFi)测序读长较长(15-20 kb)且高度准确(>Q20)。由于这些特性,它们彻底改变了基因组组装,生成了更准确和连续的基因组。在真核生物中,线粒体基因组与核基因组一起测序,通常覆盖度很高。目前仍然缺少专门用于使用 HiFi 读取进行线粒体基因组组装的工具。

结果

MitoHiFi 是在达尔文生命之树项目中开发的,用于从目标物种生成的 HiFi 读取中组装线粒体基因组。MitoHiFi 的输入可以是原始读取或组装的 contigs,该工具会输出一个线粒体基因组序列 fasta 文件以及蛋白质和 RNA 基因的注释。由异质性引起的变体被独立组装,并且识别出核插入线粒体序列,并在细胞器基因组组装中不使用它们。MitoHiFi 已用于为达尔文生命之树项目、脊椎动物基因组计划和水生共生基因组计划组装了 374 个线粒体基因组(368 个 Metazoa 和 6 个真菌物种)。对 60 个使用 MitoHiFi 组装的具有公共数据库中参考序列的物种的线粒体基因组进行检查,显示出广泛存在以前未报告的重复序列。

结论

MitoHiFi 能够从广泛的分类群的 Pacbio HiFi 数据中组装线粒体基因组。MitoHiFi 是用 Python 编写的,可在 GitHub(https://github.com/marcelauliano/MitoHiFi)上免费获得。MitoHiFi 及其依赖项作为 GitHub 上的 Docker 容器提供(ghcr.io/marcelauliano/mitohifi:master)。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3a1c/10354987/52683836b12a/12859_2023_5385_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验