Suppr超能文献

甜菜(Beta vulgaris)叶绿体基因组的单分子实时测序从头组装

SMRT sequencing only de novo assembly of the sugar beet (Beta vulgaris) chloroplast genome.

作者信息

Stadermann Kai Bernd, Weisshaar Bernd, Holtgräwe Daniela

机构信息

Chair of Genome Research, Faculty of Biology, Bielefeld University, Bielefeld, Germany.

Bioinformatics Resource Facility, Centre for Biotechnology, Bielefeld University, Bielefeld, Germany.

出版信息

BMC Bioinformatics. 2015 Sep 16;16(1):295. doi: 10.1186/s12859-015-0726-6.

Abstract

BACKGROUND

Third generation sequencing methods, like SMRT (Single Molecule, Real-Time) sequencing developed by Pacific Biosciences, offer much longer read length in comparison to Next Generation Sequencing (NGS) methods. Hence, they are well suited for de novo- or re-sequencing projects. Sequences generated for these purposes will not only contain reads originating from the nuclear genome, but also a significant amount of reads originating from the organelles of the target organism. These reads are usually discarded but they can also be used for an assembly of organellar replicons. The long read length supports resolution of repetitive regions and repeats within the organelles genome which might be problematic when just using short read data. Additionally, SMRT sequencing is less influenced by GC rich areas and by long stretches of the same base.

RESULTS

We describe a workflow for a de novo assembly of the sugar beet (Beta vulgaris ssp. vulgaris) chloroplast genome sequence only based on data originating from a SMRT sequencing dataset targeted on its nuclear genome. We show that the data obtained from such an experiment are sufficient to create a high quality assembly with a higher reliability than assemblies derived from e.g. Illumina reads only. The chloroplast genome is especially challenging for de novo assembling as it contains two large inverted repeat (IR) regions. We also describe some limitations that still apply even though long reads are used for the assembly.

CONCLUSIONS

SMRT sequencing reads extracted from a dataset created for nuclear genome (re)sequencing can be used to obtain a high quality de novo assembly of the chloroplast of the sequenced organism. Even with a relatively small overall coverage for the nuclear genome it is possible to collect more than enough reads to generate a high quality assembly that outperforms short read based assemblies. However, even with long reads it is not always possible to clarify the order of elements of a chloroplast genome sequence reliantly which we could demonstrate with Fosmid End Sequences (FES) generated with Sanger technology. Nevertheless, this limitation also applies to short read sequencing data but is reached in this case at a much earlier stage during finishing.

摘要

背景

第三代测序方法,如太平洋生物科学公司开发的单分子实时(SMRT)测序,与下一代测序(NGS)方法相比,读长要长得多。因此,它们非常适合从头测序或重测序项目。为这些目的生成的序列不仅会包含来自核基因组的读段,还会包含大量来自目标生物体细胞器的读段。这些读段通常会被丢弃,但它们也可用于细胞器复制子的组装。长读长有助于解决细胞器基因组中的重复区域和重复序列问题,而仅使用短读数据时这些问题可能会很棘手。此外,SMRT测序受富含GC区域和相同碱基长片段的影响较小。

结果

我们描述了一种仅基于针对甜菜(Beta vulgaris ssp. vulgaris)核基因组的SMRT测序数据集的数据,从头组装其叶绿体基因组序列的工作流程。我们表明,从这样一个实验中获得的数据足以创建一个高质量的组装体,其可靠性高于仅从例如Illumina读段衍生的组装体。叶绿体基因组对于从头组装尤其具有挑战性,因为它包含两个大的反向重复(IR)区域。我们还描述了一些即使使用长读段进行组装仍然存在的局限性。

结论

从为核基因组(重)测序创建的数据集中提取的SMRT测序读段可用于获得已测序生物体叶绿体的高质量从头组装体。即使核基因组的总体覆盖度相对较小,也有可能收集到足够多的读段来生成一个优于基于短读段的组装体的高质量组装体。然而,即使使用长读段,也并非总是能够可靠地确定叶绿体基因组序列元件的顺序,我们通过桑格技术生成的Fosmid末端序列(FES)证明了这一点。尽管如此,这一局限性也适用于短读段测序数据,但在这种情况下,在完成过程的更早阶段就会出现。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/87d5/4573686/41f3a7553ce1/12859_2015_726_Fig1_HTML.jpg

相似文献

1
SMRT sequencing only de novo assembly of the sugar beet (Beta vulgaris) chloroplast genome.
BMC Bioinformatics. 2015 Sep 16;16(1):295. doi: 10.1186/s12859-015-0726-6.
3
HISEA: HIerarchical SEed Aligner for PacBio data.
BMC Bioinformatics. 2017 Dec 19;18(1):564. doi: 10.1186/s12859-017-1953-9.
6
Genome Sequencing.
Methods Mol Biol. 2018;1775:37-52. doi: 10.1007/978-1-4939-7804-5_4.
7
Genome assembly using Nanopore-guided long and error-free DNA reads.
BMC Genomics. 2015 Apr 20;16(1):327. doi: 10.1186/s12864-015-1519-z.
9
Can we use it? On the utility of de novo and reference-based assembly of Nanopore data for plant plastome sequencing.
PLoS One. 2020 Mar 24;15(3):e0226234. doi: 10.1371/journal.pone.0226234. eCollection 2020.

引用本文的文献

1
Mapping-based genome size estimation.
BMC Genomics. 2025 May 14;26(1):482. doi: 10.1186/s12864-025-11640-8.
2
Full-length transcriptome characterization of based on the PacBio platform.
Front Genet. 2024 Jan 18;15:1345039. doi: 10.3389/fgene.2024.1345039. eCollection 2024.
3
The complete chloroplast genome of (franch.) Pax, 1934.
Mitochondrial DNA B Resour. 2023 Apr 3;8(4):471-474. doi: 10.1080/23802359.2023.2195514. eCollection 2023.
4
Plastid Genome Assembly Using Long-read data.
Mol Ecol Resour. 2023 Aug;23(6):1442-1457. doi: 10.1111/1755-0998.13787. Epub 2023 Apr 2.
5
Single-molecule real-time sequencing of the full-length transcriptome of Halophila beccarii.
Sci Rep. 2022 Sep 30;12(1):16444. doi: 10.1038/s41598-022-20988-w.
6
Software Choice and Sequencing Coverage Can Impact Plastid Genome Assembly-A Case Study in the Narrow Endemic .
Front Plant Sci. 2022 Jul 6;13:779830. doi: 10.3389/fpls.2022.779830. eCollection 2022.
8
Genomic distances reveal relationships of wild and cultivated beets.
Nat Commun. 2022 Apr 19;13(1):2021. doi: 10.1038/s41467-022-29676-9.
9
SMRT sequencing of the full-length transcriptome of Gekko gecko.
PLoS One. 2022 Feb 25;17(2):e0264499. doi: 10.1371/journal.pone.0264499. eCollection 2022.

本文引用的文献

1
Genome sequence of cultivated Upland cotton (Gossypium hirsutum TM-1) provides insights into genome evolution.
Nat Biotechnol. 2015 May;33(5):524-30. doi: 10.1038/nbt.3208. Epub 2015 Apr 20.
2
6
The complete chloroplast genome sequence of sugar beet (Beta vulgaris ssp. vulgaris).
Mitochondrial DNA. 2014 Jun;25(3):209-11. doi: 10.3109/19401736.2014.883611. Epub 2014 Feb 26.
7
The genome of the recently domesticated crop plant sugar beet (Beta vulgaris).
Nature. 2014 Jan 23;505(7484):546-9. doi: 10.1038/nature12817. Epub 2013 Dec 18.
8
Segregation of random amplified DNA markers in F1 progeny of conifers.
Theor Appl Genet. 1991 Dec;83(2):194-200. doi: 10.1007/BF00226251.
10
Characterizing and measuring bias in sequence data.
Genome Biol. 2013 May 29;14(5):R51. doi: 10.1186/gb-2013-14-5-r51.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验