• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

杂果桉基因组草图组装:从头组装比较的流水线。

The draft nuclear genome assembly of Eucalyptus pauciflora: a pipeline for comparing de novo assemblies.

机构信息

Research School of Biology, the Australian National University. 134 Linnaeus Way, Acton, Canberra, ACT, 2601, Australia.

Department of Genetics and Animal Breeding, Faculty of Veterinary Medicine, Chittagong Veterinary and Animal Sciences University. Khulshi, Chattogram, 4225, Bangladesh.

出版信息

Gigascience. 2020 Jan 1;9(1). doi: 10.1093/gigascience/giz160.

DOI:10.1093/gigascience/giz160
PMID:31895413
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6939829/
Abstract

BACKGROUND

Eucalyptus pauciflora (the snow gum) is a long-lived tree with high economic and ecological importance. Currently, little genomic information for E. pauciflora is available. Here, we sequentially assemble the genome of Eucalyptus pauciflora with different methods, and combine multiple existing and novel approaches to help to select the best genome assembly.

FINDINGS

We generated high coverage of long- (Nanopore, 174×) and short- (Illumina, 228×) read data from a single E. pauciflora individual and compared assemblies from 5 assemblers (Canu, SMARTdenovo, Flye, Marvel, and MaSuRCA) with different read lengths (1 and 35 kb minimum read length). A key component of our approach is to keep a randomly selected collection of ∼10% of both long and short reads separated from the assemblies to use as a validation set for assessing assemblies. Using this validation set along with a range of existing tools, we compared the assemblies in 8 ways: contig N50, BUSCO scores, LAI (long terminal repeat assembly index) scores, assembly ploidy, base-level error rate, CGAL (computing genome assembly likelihoods) scores, structural variation, and genome sequence similarity. Our result showed that MaSuRCA generated the best assembly, which is 594.87 Mb in size, with a contig N50 of 3.23 Mb, and an estimated error rate of ∼0.006 errors per base.

CONCLUSIONS

We report a draft genome of E. pauciflora, which will be a valuable resource for further genomic studies of eucalypts. The approaches for assessing and comparing genomes should help in assessing and choosing among many potential genome assemblies from a single dataset.

摘要

背景

桉树(雪桉)是一种具有重要经济和生态价值的长寿树种。目前,有关桉树的基因组信息很少。在这里,我们使用不同的方法对桉树的基因组进行了顺序组装,并结合了多种现有的和新的方法来帮助选择最佳的基因组组装。

发现

我们从单个桉树个体中生成了高覆盖度的长读(纳米孔,174×)和短读(Illumina,228×)数据,并比较了 5 种组装器(Canu、SMARTdenovo、Flye、Marvel 和 MaSuRCA)使用不同读长(最小读长 1 和 35 kb)的组装结果。我们方法的一个关键组成部分是将随机选择的长读和短读的约 10%保留下来,作为评估组装的验证集。使用这个验证集以及一系列现有的工具,我们从 8 个方面比较了组装结果:contig N50、BUSCO 分数、LAI(长末端重复组装指数)分数、组装的倍性、碱基水平错误率、CGAL(计算基因组组装可能性)分数、结构变异和基因组序列相似性。结果表明,MaSuRCA 生成的组装结果最好,大小为 594.87 Mb,contig N50 为 3.23 Mb,估计的碱基错误率约为 0.006 个错误/碱基。

结论

我们报告了桉树的一个草图基因组,这将是桉树进一步基因组研究的宝贵资源。评估和比较基因组的方法应该有助于评估和选择来自单个数据集的许多潜在基因组组装。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f50a/6939829/b588e79a5c75/giz160fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f50a/6939829/2703cece62a2/giz160fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f50a/6939829/7e4611f5bbcb/giz160fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f50a/6939829/e1aaedd7e159/giz160fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f50a/6939829/1dd6f1ea73c8/giz160fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f50a/6939829/b588e79a5c75/giz160fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f50a/6939829/2703cece62a2/giz160fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f50a/6939829/7e4611f5bbcb/giz160fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f50a/6939829/e1aaedd7e159/giz160fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f50a/6939829/1dd6f1ea73c8/giz160fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f50a/6939829/b588e79a5c75/giz160fig5.jpg

相似文献

1
The draft nuclear genome assembly of Eucalyptus pauciflora: a pipeline for comparing de novo assemblies.杂果桉基因组草图组装:从头组装比较的流水线。
Gigascience. 2020 Jan 1;9(1). doi: 10.1093/gigascience/giz160.
2
Assembly of chloroplast genomes with long- and short-read data: a comparison of approaches using Eucalyptus pauciflora as a test case.利用长读长和短读数据组装叶绿体基因组:以白千层作为测试案例的方法比较。
BMC Genomics. 2018 Dec 29;19(1):977. doi: 10.1186/s12864-018-5348-8.
3
Draft genome assemblies using sequencing reads from Oxford Nanopore Technology and Illumina platforms for four species of North American Fundulus killifish.利用来自牛津纳米孔技术和 Illumina 平台的测序reads 为北美花鳉属的四个物种构建基因组草图。
Gigascience. 2020 Jun 1;9(6). doi: 10.1093/gigascience/giaa067.
4
High-quality genome assembly of Huazhan and Tianfeng, the parents of an elite rice hybrid Tian-you-hua-zhan.华占和天丰基因组高质量组装,其为优质杂交稻天优华占的双亲。
Sci China Life Sci. 2022 Feb;65(2):398-411. doi: 10.1007/s11427-020-1940-9. Epub 2021 Jun 28.
5
Benchmarking hybrid assembly approaches for genomic analyses of bacterial pathogens using Illumina and Oxford Nanopore sequencing.使用 Illumina 和 Oxford Nanopore 测序对细菌病原体进行基因组分析的混合组装方法的基准测试。
BMC Genomics. 2020 Sep 14;21(1):631. doi: 10.1186/s12864-020-07041-8.
6
Benchmarking of de novo assembly algorithms for Nanopore data reveals optimal performance of OLC approaches.用于纳米孔数据的从头组装算法基准测试揭示了重叠布局一致(OLC)方法的最佳性能。
BMC Genomics. 2016 Aug 22;17 Suppl 7(Suppl 7):507. doi: 10.1186/s12864-016-2895-8.
7
Polishing the Oxford Nanopore long-read assemblies of bacterial pathogens with Illumina short reads to improve genomic analyses.用 Illumina 短读序列对牛津纳米孔长读序列组装的细菌病原体进行打磨,以改进基因组分析。
Genomics. 2021 May;113(3):1366-1377. doi: 10.1016/j.ygeno.2021.03.018. Epub 2021 Mar 11.
8
Completion of draft bacterial genomes by long-read sequencing of synthetic genomic pools.通过合成基因组文库的长读长测序完成细菌基因组草图
BMC Genomics. 2020 Jul 29;21(1):519. doi: 10.1186/s12864-020-06910-6.
9
LR_Gapcloser: a tiling path-based gap closer that uses long reads to complete genome assembly.LR_Gapcloser:一种基于平铺路径的缺口闭合器,它使用长读长来完成基因组组装。
Gigascience. 2019 Jan 1;8(1):giy157. doi: 10.1093/gigascience/giy157.
10
Benchmarking Long-Read Assemblers for Genomic Analyses of Bacterial Pathogens Using Oxford Nanopore Sequencing.基于 Oxford Nanopore 测序的细菌病原体基因组分析的长读长组装器基准测试
Int J Mol Sci. 2020 Dec 1;21(23):9161. doi: 10.3390/ijms21239161.

引用本文的文献

1
A telomere-to-telomere Eucalyptus regnans genome: unveiling haplotype variance in structure and genes within one of the world's tallest trees.端粒到端粒桉树 regnans 基因组:揭示世界上最高的树木之一的结构和基因中的单倍型变异。
BMC Genomics. 2024 Sep 30;25(1):913. doi: 10.1186/s12864-024-10810-4.
2
Klumpy: A tool to evaluate the integrity of long-read genome assemblies and illusive sequence motifs.Klumpy:一种评估长读长基因组组装完整性和难以捉摸的序列基序的工具。
Mol Ecol Resour. 2025 Jan;25(1):e13982. doi: 10.1111/1755-0998.13982. Epub 2024 May 27.
3
Plasticity of repetitive sequences demonstrated by the complete mitochondrial genome of .

本文引用的文献

1
Fast and accurate long-read assembly with wtdbg2.使用 wtdbg2 实现快速准确的长读长序列组装。
Nat Methods. 2020 Feb;17(2):155-158. doi: 10.1038/s41592-019-0669-3. Epub 2019 Dec 9.
2
Assembly of long, error-prone reads using repeat graphs.使用重复图组装长的、易错的读取。
Nat Biotechnol. 2019 May;37(5):540-546. doi: 10.1038/s41587-019-0072-8. Epub 2019 Apr 1.
3
Assembly of chloroplast genomes with long- and short-read data: a comparison of approaches using Eucalyptus pauciflora as a test case.利用长读长和短读数据组装叶绿体基因组:以白千层作为测试案例的方法比较。
.的完整线粒体基因组所展示的重复序列可塑性
Front Plant Sci. 2024 Mar 27;15:1339594. doi: 10.3389/fpls.2024.1339594. eCollection 2024.
4
Plant genome evolution in the genus is driven by structural rearrangements that promote sequence divergence.是由促进序列分化的结构重排驱动的。
Genome Res. 2024 May 15;34(4):606-619. doi: 10.1101/gr.277999.123.
5
First Contiguous Genome Assembly of Japanese Lady Bell () and Insights into Development of Different Leaf Types.日本女郎花()的首个连续基因组组装及不同叶型发育的研究进展
Genes (Basel). 2023 Dec 30;15(1):58. doi: 10.3390/genes15010058.
6
A high-quality pseudo-phased genome for Melaleuca quinquenervia shows allelic diversity of NLR-type resistance genes.高质量拟南芥假基因组揭示互叶白千层 NLR 型抗性基因的等位基因多样性。
Gigascience. 2022 Dec 28;12. doi: 10.1093/gigascience/giad102. Epub 2023 Dec 14.
7
Two long read-based genome assembly and annotation of polyploidy woody plants, Hibiscus syriacus L. using PacBio and Nanopore platforms.利用 PacBio 和 Nanopore 平台对多倍体木本植物木槿进行基于长读长的基因组组装和注释。
Sci Data. 2023 Oct 18;10(1):713. doi: 10.1038/s41597-023-02631-z.
8
A practical assembly guideline for genomes with various levels of heterozygosity.具有不同杂合度基因组的实用组装指南。
Brief Bioinform. 2023 Sep 22;24(6). doi: 10.1093/bib/bbad337.
9
Haplogenome assembly reveals structural variation in Eucalyptus interspecific hybrids.单倍型基因组组装揭示桉树种间杂种的结构变异。
Gigascience. 2022 Dec 28;12. doi: 10.1093/gigascience/giad064. Epub 2023 Aug 26.
10
Complete Genome of Rose Myrtle, , and Its Population Genetics in Thai Peninsula.玫瑰紫薇的全基因组及其在泰国半岛的群体遗传学。
Plants (Basel). 2023 Apr 7;12(8):1582. doi: 10.3390/plants12081582.
BMC Genomics. 2018 Dec 29;19(1):977. doi: 10.1186/s12864-018-5348-8.
4
Purge Haplotigs: allelic contig reassignment for third-gen diploid genome assemblies.清除单倍型:三代二倍体基因组组装的等位基因 contig 重新分配。
BMC Bioinformatics. 2018 Nov 29;19(1):460. doi: 10.1186/s12859-018-2485-7.
5
Assessing genome assembly quality using the LTR Assembly Index (LAI).使用长末端重复序列组装指数(LAI)评估基因组组装质量。
Nucleic Acids Res. 2018 Nov 30;46(21):e126. doi: 10.1093/nar/gky730.
6
Draft genome assembly of the invasive cane toad, Rhinella marina.入侵性蟾蜍,海蟾蜍(Rhinella marina)的基因组草图组装。
Gigascience. 2018 Sep 1;7(9):giy095. doi: 10.1093/gigascience/giy095.
7
A graph-based approach to diploid genome assembly.基于图的二倍体基因组组装方法。
Bioinformatics. 2018 Jul 1;34(13):i105-i114. doi: 10.1093/bioinformatics/bty279.
8
Accurate detection of complex structural variations using single-molecule sequencing.利用单分子测序技术准确检测复杂结构变异。
Nat Methods. 2018 Jun;15(6):461-468. doi: 10.1038/s41592-018-0001-7. Epub 2018 Apr 30.
9
NanoPack: visualizing and processing long-read sequencing data.NanoPack:可视化和处理长读测序数据。
Bioinformatics. 2018 Aug 1;34(15):2666-2669. doi: 10.1093/bioinformatics/bty149.
10
MUMmer4: A fast and versatile genome alignment system.MUMmer4:一种快速且通用的基因组比对系统。
PLoS Comput Biol. 2018 Jan 26;14(1):e1005944. doi: 10.1371/journal.pcbi.1005944. eCollection 2018 Jan.