• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

ReMILO:使用短读长读的参考辅助错误组装检测算法。

ReMILO: reference assisted misassembly detection algorithm using short and long reads.

机构信息

Software Engineering Research Center, School of Software Engineering, Beijing Jiaotong University, Beijing 100044, China.

Department of Botany and Plant Sciences, University of California, Riverside, CA 92521, USA.

出版信息

Bioinformatics. 2018 Jan 1;34(1):24-32. doi: 10.1093/bioinformatics/btx524.

DOI:10.1093/bioinformatics/btx524
PMID:28961789
Abstract

MOTIVATION

Contigs assembled from the second generation sequencing short reads may contain misassemblies, and thus complicate downstream analysis or even lead to incorrect analysis results. Fortunately, with more and more sequenced species available, it becomes possible to use the reference genome of a closely related species to detect misassemblies. In addition, long reads of the third generation sequencing technology have been more and more widely used, and can also help detect misassemblies.

RESULTS

Here, we introduce ReMILO, a reference assisted misassembly detection algorithm that uses both short reads and PacBio SMRT long reads. ReMILO aligns the initial short reads to both the contigs and reference genome, and then constructs a novel data structure called red-black multipositional de Bruijn graph to detect misassemblies. In addition, ReMILO also aligns the contigs to long reads and find their differences from the long reads to detect more misassemblies. In our performance test on short read assemblies of human chromosome 14 data, ReMILO can detect 41.8-77.9% extensive misassemblies and 33.6-54.5% local misassemblies. On hybrid short and long read assemblies of S.pastorianus data, ReMILO can also detect 60.6-70.9% extensive misassemblies and 28.6-54.0% local misassemblies.

AVAILABILITY AND IMPLEMENTATION

The ReMILO software can be downloaded for free under Artistic License 2.0 from this site: https://github.com/songc001/remilo.

CONTACT

baoe@bjtu.edu.cn.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

由第二代测序短读序列组装的 contigs 可能包含错误组装,从而使下游分析复杂化,甚至导致分析结果错误。幸运的是,随着越来越多的测序物种可用,使用近缘物种的参考基因组来检测错误组装成为可能。此外,第三代测序技术的长读长越来越广泛地被使用,也有助于检测错误组装。

结果

在这里,我们介绍了 ReMILO,一种使用短读长和 PacBio SMRT 长读长的参考辅助错误组装检测算法。ReMILO 将初始短读长与 contigs 和参考基因组进行比对,然后构建一种称为红黑多位置 de Bruijn 图的新数据结构来检测错误组装。此外,ReMILO 还将 contigs 与长读长进行比对,并从长读长中找到它们之间的差异,以检测更多的错误组装。在我们对人类染色体 14 数据的短读长组装的性能测试中,ReMILO 可以检测到 41.8-77.9%的广泛错误组装和 33.6-54.5%的局部错误组装。在 S.pastorianus 的混合短读长和长读长组装中,ReMILO 也可以检测到 60.6-70.9%的广泛错误组装和 28.6-54.0%的局部错误组装。

可用性和实现

ReMILO 软件可以在 Artistic License 2.0 下免费从以下网址下载:https://github.com/songc001/remilo。

联系信息

baoe@bjtu.edu.cn。

补充信息

补充数据可在 Bioinformatics 在线获取。

相似文献

1
ReMILO: reference assisted misassembly detection algorithm using short and long reads.ReMILO:使用短读长读的参考辅助错误组装检测算法。
Bioinformatics. 2018 Jan 1;34(1):24-32. doi: 10.1093/bioinformatics/btx524.
2
AlignGraph: algorithm for secondary de novo genome assembly guided by closely related references.AlignGraph:一种基于密切相关参考序列指导的二级从头基因组组装算法。
Bioinformatics. 2014 Jun 15;30(12):i319-i328. doi: 10.1093/bioinformatics/btu291.
3
Unicycler: Resolving bacterial genome assemblies from short and long sequencing reads.单轮循环器:从短读长和长读长测序数据中解析细菌基因组组装结果
PLoS Comput Biol. 2017 Jun 8;13(6):e1005595. doi: 10.1371/journal.pcbi.1005595. eCollection 2017 Jun.
4
AlignGraph2: similar genome-assisted reassembly pipeline for PacBio long reads.AlignGraph2:用于 PacBio 长读长的相似基因组辅助重组装流程。
Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbab022.
5
GMcloser: closing gaps in assemblies accurately with a likelihood-based selection of contig or long-read alignments.GMcloser:基于可能性选择 contig 或长读序列比对来精确地闭合组装缺口。
Bioinformatics. 2015 Dec 1;31(23):3733-41. doi: 10.1093/bioinformatics/btv465. Epub 2015 Aug 10.
6
RepLong: de novo repeat identification using long read sequencing data.RepLong:利用长读测序数据进行从头重复识别。
Bioinformatics. 2018 Apr 1;34(7):1099-1107. doi: 10.1093/bioinformatics/btx717.
7
Genome sequence assembly algorithms and misassembly identification methods.基因组序列组装算法和错误组装识别方法。
Mol Biol Rep. 2022 Nov;49(11):11133-11148. doi: 10.1007/s11033-022-07919-8. Epub 2022 Sep 23.
8
FLAS: fast and high-throughput algorithm for PacBio long-read self-correction.FLAS:用于 PacBio 长读长自我纠错的快速高通量算法。
Bioinformatics. 2019 Oct 15;35(20):3953-3960. doi: 10.1093/bioinformatics/btz206.
9
HALC: High throughput algorithm for long read error correction.HALC:用于长读长纠错的高通量算法。
BMC Bioinformatics. 2017 Apr 5;18(1):204. doi: 10.1186/s12859-017-1610-3.
10
Toward perfect reads: self-correction of short reads via mapping on de Bruijn graphs.迈向完美读段:通过在 De Bruijn 图上进行映射来自我纠正短读段。
Bioinformatics. 2020 Mar 1;36(5):1374-1381. doi: 10.1093/bioinformatics/btz102.

引用本文的文献

1
Klumpy: A tool to evaluate the integrity of long-read genome assemblies and illusive sequence motifs.Klumpy:一种评估长读长基因组组装完整性和难以捉摸的序列基序的工具。
Mol Ecol Resour. 2025 Jan;25(1):e13982. doi: 10.1111/1755-0998.13982. Epub 2024 May 27.
2
LongStitch: high-quality genome assembly correction and scaffolding using long reads.LongStitch:使用长读长进行高质量基因组组装纠错和 scaffolding。
BMC Bioinformatics. 2021 Oct 30;22(1):534. doi: 10.1186/s12859-021-04451-7.
3
Hardware acceleration of genomics data analysis: challenges and opportunities.
基因组数据分析的硬件加速:挑战与机遇
Bioinformatics. 2021 Jul 27;37(13):1785-1795. doi: 10.1093/bioinformatics/btab017.
4
SKESA: strategic k-mer extension for scrupulous assemblies.SKESA:用于严谨组装的策略性 k--mer 扩展。
Genome Biol. 2018 Oct 4;19(1):153. doi: 10.1186/s13059-018-1540-z.