Suppr超能文献

利用长读长数据进行质体基因组组装。

Plastid Genome Assembly Using Long-read data.

机构信息

Department of Biology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA.

Laboratorio de Biotecnología Vegetal, Universidad San Francisco de Quito USFQ, Quito, Ecuador.

出版信息

Mol Ecol Resour. 2023 Aug;23(6):1442-1457. doi: 10.1111/1755-0998.13787. Epub 2023 Apr 2.

Abstract

Although plastid genome (plastome) structure is highly conserved across most seed plants, investigations during the past two decades have revealed several disparately related lineages that experienced substantial rearrangements. Most plastomes contain a large inverted repeat and two single-copy regions, and a few dispersed repeats; however, the plastomes of some taxa harbour long repeat sequences (>300 bp). These long repeats make it challenging to assemble complete plastomes using short-read data, leading to misassemblies and consensus sequences with spurious rearrangements. Single-molecule, long-read sequencing has the potential to overcome these challenges, yet there is no consensus on the most effective method for accurately assembling plastomes using long-read data. We generated a pipeline, plastid Genome Assembly Using Long-read data (ptGAUL), to address the problem of plastome assembly using long-read data from Oxford Nanopore Technologies (ONT) or Pacific Biosciences platforms. We demonstrated the efficacy of the ptGAUL pipeline using 16 published long-read data sets. We showed that ptGAUL quickly produces accurate and unbiased assemblies using only ~50× coverage of plastome data. Additionally, we deployed ptGAUL to assemble four new Juncus (Juncaceae) plastomes using ONT long reads. Our results revealed many long repeats and rearrangements in Juncus plastomes compared with basal lineages of Poales. The ptGAUL pipeline is available on GitHub: https://github.com/Bean061/ptgaul.

摘要

虽然质体基因组(质体基因组)结构在大多数种子植物中高度保守,但在过去二十年的研究中,发现了几个经历了大量重排的截然不同的谱系。大多数质体基因组包含一个大的反向重复序列和两个单拷贝区域,以及一些分散的重复序列;然而,一些类群的质体基因组含有长重复序列(>300bp)。这些长重复序列使得使用短读数据组装完整的质体基因组变得具有挑战性,导致组装错误和具有虚假重排的共识序列。单分子、长读测序有可能克服这些挑战,但对于使用长读数据准确组装质体基因组,哪种方法最有效还没有共识。我们生成了一个使用长读数据组装质体基因组的管道(ptGAUL),以解决使用来自牛津纳米孔技术(ONT)或太平洋生物科学平台的长读数据组装质体基因组的问题。我们使用 16 个已发表的长读数据集证明了 ptGAUL 管道的有效性。我们表明,ptGAUL 仅使用约 50×的质体基因组数据覆盖,就能快速生成准确且无偏的组装。此外,我们还使用 ONT 长读数据来组装四个新的 Juncus(灯心草科)质体基因组。我们的结果表明,与 Poales 的基类群相比,Juncus 质体基因组中有许多长重复序列和重排。ptGAUL 管道可在 GitHub 上获得:https://github.com/Bean061/ptgaul。

相似文献

1
Plastid Genome Assembly Using Long-read data.利用长读长数据进行质体基因组组装。
Mol Ecol Resour. 2023 Aug;23(6):1442-1457. doi: 10.1111/1755-0998.13787. Epub 2023 Apr 2.

引用本文的文献

2
Complete sequence of the chloroplast genome determined by long-read sequencing.通过长读测序确定的叶绿体基因组完整序列。
Mitochondrial DNA B Resour. 2025 Jul 23;10(8):753-757. doi: 10.1080/23802359.2025.2535628. eCollection 2025.
10
The mitochondrial genome of H. Lév. & Vaniot, an endemic sedge in Korea.韩国特有莎草H. Lév. & Vaniot的线粒体基因组。
Mitochondrial DNA B Resour. 2025 Jan 7;10(1):88-93. doi: 10.1080/23802359.2024.2449090. eCollection 2025.

本文引用的文献

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验