• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

LongStitch:使用长读长进行高质量基因组组装纠错和 scaffolding。

LongStitch: high-quality genome assembly correction and scaffolding using long reads.

机构信息

Canada's Michael Smith Genome Sciences Centre, BC Cancer Research, 100-570 West 7th Avenue, Vancouver, BC, V5Z 4S6, Canada.

出版信息

BMC Bioinformatics. 2021 Oct 30;22(1):534. doi: 10.1186/s12859-021-04451-7.

DOI:10.1186/s12859-021-04451-7
PMID:34717540
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8557608/
Abstract

BACKGROUND

Generating high-quality de novo genome assemblies is foundational to the genomics study of model and non-model organisms. In recent years, long-read sequencing has greatly benefited genome assembly and scaffolding, a process by which assembled sequences are ordered and oriented through the use of long-range information. Long reads are better able to span repetitive genomic regions compared to short reads, and thus have tremendous utility for resolving problematic regions and helping generate more complete draft assemblies. Here, we present LongStitch, a scalable pipeline that corrects and scaffolds draft genome assemblies exclusively using long reads.

RESULTS

LongStitch incorporates multiple tools developed by our group and runs in up to three stages, which includes initial assembly correction (Tigmint-long), followed by two incremental scaffolding stages (ntLink and ARKS-long). Tigmint-long and ARKS-long are misassembly correction and scaffolding utilities, respectively, previously developed for linked reads, that we adapted for long reads. Here, we describe the LongStitch pipeline and introduce our new long-read scaffolder, ntLink, which utilizes lightweight minimizer mappings to join contigs. LongStitch was tested on short and long-read assemblies of Caenorhabditis elegans, Oryza sativa, and three different human individuals using corresponding nanopore long-read data, and improves the contiguity of each assembly from 1.2-fold up to 304.6-fold (as measured by NGA50 length). Furthermore, LongStitch generates more contiguous and correct assemblies compared to state-of-the-art long-read scaffolder LRScaf in most tests, and consistently improves upon human assemblies in under five hours using less than 23 GB of RAM.

CONCLUSIONS

Due to its effectiveness and efficiency in improving draft assemblies using long reads, we expect LongStitch to benefit a wide variety of de novo genome assembly projects. The LongStitch pipeline is freely available at https://github.com/bcgsc/longstitch .

摘要

背景

生成高质量的从头基因组组装是模型和非模型生物基因组学研究的基础。近年来,长读测序极大地促进了基因组组装和支架的构建,该过程通过使用长程信息对组装序列进行排序和定向。与短读相比,长读能够更好地跨越重复基因组区域,因此对于解决有问题的区域并帮助生成更完整的草图组装具有巨大的实用价值。在这里,我们展示了 LongStitch,这是一个仅使用长读来纠正和支架草图基因组组装的可扩展流水线。

结果

LongStitch 整合了我们小组开发的多个工具,最多可以分三个阶段运行,包括初始组装纠正(Tigmint-long),然后是两个增量支架阶段(ntLink 和 ARKS-long)。Tigmint-long 和 ARKS-long 分别是我们为链接读开发的错误组装纠正和支架工具,我们对其进行了调整以适应长读。在这里,我们描述了 LongStitch 流水线,并介绍了我们的新长读支架 ntLink,它利用轻量级的 minimizer 映射来连接 contigs。使用相应的纳米孔长读数据,对秀丽隐杆线虫、水稻和三个不同个体的短读和长读组装进行了 LongStitch 测试,每个组装的连续性都提高了 1.2 倍到 304.6 倍(以 NGA50 长度衡量)。此外,在大多数测试中,与最先进的长读支架 LRScaf 相比,LongStitch 生成了更连续和正确的组装,并且在不到五个小时的时间内使用不到 23GB 的 RAM 始终可以提高人类组装的质量。

结论

由于其使用长读来有效提高草图组装的效果和效率,我们预计 LongStitch 将使各种从头基因组组装项目受益。LongStitch 流水线可在 https://github.com/bcgsc/longstitch 免费获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8193/8557608/c1cd90a6c122/12859_2021_4451_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8193/8557608/57f518d9c24b/12859_2021_4451_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8193/8557608/d16ea66f4657/12859_2021_4451_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8193/8557608/c1cd90a6c122/12859_2021_4451_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8193/8557608/57f518d9c24b/12859_2021_4451_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8193/8557608/d16ea66f4657/12859_2021_4451_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8193/8557608/c1cd90a6c122/12859_2021_4451_Fig3_HTML.jpg

相似文献

1
LongStitch: high-quality genome assembly correction and scaffolding using long reads.LongStitch:使用长读长进行高质量基因组组装纠错和 scaffolding。
BMC Bioinformatics. 2021 Oct 30;22(1):534. doi: 10.1186/s12859-021-04451-7.
2
ntLink: A Toolkit for De Novo Genome Assembly Scaffolding and Mapping Using Long Reads.ntLink:一种使用长读长进行从头基因组组装支架和映射的工具包。
Curr Protoc. 2023 Apr;3(4):e733. doi: 10.1002/cpz1.733.
3
ARKS: chromosome-scale scaffolding of human genome drafts with linked read kmers.ARKS:基于链接读取子的人类基因组草图染色体级 scaffolding。
BMC Bioinformatics. 2018 Jun 20;19(1):234. doi: 10.1186/s12859-018-2243-x.
4
Tigmint: correcting assembly errors using linked reads from large molecules.Tigmint:使用来自大分子量的连锁读取来修正组装错误。
BMC Bioinformatics. 2018 Oct 26;19(1):393. doi: 10.1186/s12859-018-2425-6.
5
LRScaf: improving draft genomes using long noisy reads.LRScaf:利用长噪声读取提高草稿基因组。
BMC Genomics. 2019 Dec 9;20(1):955. doi: 10.1186/s12864-019-6337-2.
6
ntJoin: Fast and lightweight assembly-guided scaffolding using minimizer graphs.ntJoin:基于最小生成树图的快速轻量级组装引导 scaffolding。
Bioinformatics. 2020 Jun 1;36(12):3885-3887. doi: 10.1093/bioinformatics/btaa253.
7
SpLitteR: diploid genome assembly using TELL-Seq linked-reads and assembly graphs.SpLitter:利用 TELL-Seq 连接读取和组装图进行二倍体基因组组装。
PeerJ. 2024 Sep 27;12:e18050. doi: 10.7717/peerj.18050. eCollection 2024.
8
SLR-superscaffolder: a de novo scaffolding tool for synthetic long reads using a top-to-bottom scheme.SLR-superscaffolder:一种从头至尾方案的用于合成长读长的从头拼接工具。
BMC Bioinformatics. 2021 Mar 25;22(1):158. doi: 10.1186/s12859-021-04081-z.
9
MaGuS: a tool for quality assessment and scaffolding of genome assemblies with Whole Genome Profiling™ Data.MaGuS:一种利用全基因组分析™数据进行基因组组装质量评估和搭建框架的工具。
BMC Bioinformatics. 2016 Mar 3;17:115. doi: 10.1186/s12859-016-0969-x.
10
Maptcha: an efficient parallel workflow for hybrid genome scaffolding.Maptcha:一种用于混合基因组支架构建的高效并行工作流程。
BMC Bioinformatics. 2024 Aug 8;25(1):263. doi: 10.1186/s12859-024-05878-4.

引用本文的文献

1
Two Melanthiaceae genomes with dramatic size difference provide insights into giant genome evolution and maintenance.两个大小差异显著的藜芦科基因组为巨型基因组的进化和维持提供了见解。
Nat Plants. 2025 Aug;11(8):1500-1513. doi: 10.1038/s41477-025-02060-3. Epub 2025 Aug 1.
2
Chromosomal scale assembly and functional annotation of the apicomplexan parasite Eimeria acervulina.顶复门寄生虫堆型艾美耳球虫的染色体水平组装与功能注释
Sci Data. 2025 May 23;12(1):852. doi: 10.1038/s41597-025-04653-1.
3
Mediterranean monk seal (Monachus monachus) and leopard seal (Hydrurga leptonyx) de novo genomes to study the demographic history and genetic diversity of southern seals.

本文引用的文献

1
A comprehensive evaluation of long read error correction methods.长读错误纠正方法的综合评价。
BMC Genomics. 2020 Dec 21;21(Suppl 6):889. doi: 10.1186/s12864-020-07227-0.
2
Nanopore sequencing and the Shasta toolkit enable efficient de novo assembly of eleven human genomes.纳米孔测序和 Shasta 工具包可实现 11 个人类基因组的高效从头组装。
Nat Biotechnol. 2020 Sep;38(9):1044-1053. doi: 10.1038/s41587-020-0503-6. Epub 2020 May 4.
3
Long-read human genome sequencing and its applications.长读长基因组测序及其应用。
地中海僧海豹(Monachus monachus)和豹海豹(Hydrurga leptonyx)的从头基因组,用于研究南大洋海豹的种群历史和遗传多样性。
BMC Biol. 2025 Apr 16;23(1):102. doi: 10.1186/s12915-025-02207-w.
4
Whole Genome Sequence of the gut commensal protist Tritrichomonas musculus isolated from laboratory mice.从实验小鼠分离出的肠道共生原生生物小鼠三毛滴虫的全基因组序列。
Sci Data. 2025 Apr 8;12(1):590. doi: 10.1038/s41597-025-04921-0.
5
Evaluating long-read assemblers to assemble several aphididae genomes.评估长读长序列拼接软件以拼接多个蚜科基因组。
Brief Bioinform. 2025 Mar 4;26(2). doi: 10.1093/bib/bbaf105.
6
Post-embryonic tail development through molting of the freshwater shrimp .淡水虾蜕皮后的胚胎后尾部发育
iScience. 2025 Jan 23;28(2):111885. doi: 10.1016/j.isci.2025.111885. eCollection 2025 Feb 21.
7
De novo whole-genome assembly of the critically endangered southern muriqui (Brachyteles arachnoides).极危物种南方绒毛蛛猴(Brachyteles arachnoides)的从头全基因组组装
G3 (Bethesda). 2025 Apr 17;15(4). doi: 10.1093/g3journal/jkaf034.
8
Small but Mitey: A Gapless Telomere-to-Telomere Assembly of an Unidentified Mite With a Streamlined Genome.虽小却强大:具有简化基因组的未知螨虫的无缝端粒到端粒组装
Genome Biol Evol. 2025 Feb 3;17(2). doi: 10.1093/gbe/evaf023.
9
A CACTA-like transposon in the Anthocyanidin synthase 1 (Ans-1) gene is responsible for apricot fruit colour in the raspberry (Rubus idaeus) cultivar 'Varnes'.花色素苷合成酶1(Ans-1)基因中的一个类CACTA转座子决定了树莓(悬钩子属)品种“瓦恩斯”的果实颜色。
PLoS One. 2025 Feb 3;20(2):e0318692. doi: 10.1371/journal.pone.0318692. eCollection 2025.
10
Chromosome-scale telomere to telomere genome assembly of common crystalwort (Riccia sorocarpa Bisch.).普通晶藓(Riccia sorocarpa Bisch.)的染色体级端粒到端粒基因组组装
Sci Data. 2025 Jan 15;12(1):77. doi: 10.1038/s41597-025-04373-6.
Nat Rev Genet. 2020 Oct;21(10):597-614. doi: 10.1038/s41576-020-0236-x. Epub 2020 Jun 5.
4
ntJoin: Fast and lightweight assembly-guided scaffolding using minimizer graphs.ntJoin:基于最小生成树图的快速轻量级组装引导 scaffolding。
Bioinformatics. 2020 Jun 1;36(12):3885-3887. doi: 10.1093/bioinformatics/btaa253.
5
Opportunities and challenges in long-read sequencing data analysis.长读测序数据分析中的机遇与挑战。
Genome Biol. 2020 Feb 7;21(1):30. doi: 10.1186/s13059-020-1935-5.
6
Fast and accurate long-read assembly with wtdbg2.使用 wtdbg2 实现快速准确的长读长序列组装。
Nat Methods. 2020 Feb;17(2):155-158. doi: 10.1038/s41592-019-0669-3. Epub 2019 Dec 9.
7
LRScaf: improving draft genomes using long noisy reads.LRScaf:利用长噪声读取提高草稿基因组。
BMC Genomics. 2019 Dec 9;20(1):955. doi: 10.1186/s12864-019-6337-2.
8
Efficient and unique cobarcoding of second-generation sequencing reads from long DNA molecules enabling cost-effective and accurate sequencing, haplotyping, and de novo assembly.高效且独特的第二代测序读长 DNA 分子 cobarcoding,实现经济高效、准确的测序、单倍型分析和从头组装。
Genome Res. 2019 May;29(5):798-808. doi: 10.1101/gr.245126.118. Epub 2019 Apr 2.
9
Tigmint: correcting assembly errors using linked reads from large molecules.Tigmint:使用来自大分子量的连锁读取来修正组装错误。
BMC Bioinformatics. 2018 Oct 26;19(1):393. doi: 10.1186/s12859-018-2425-6.
10
Versatile genome assembly evaluation with QUAST-LG.QUAST-LG 进行多功能基因组组装评估。
Bioinformatics. 2018 Jul 1;34(13):i142-i150. doi: 10.1093/bioinformatics/bty266.