文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

爬行动物:简称短读错误纠正的代表性平铺。

Reptile: representative tiling for short read error correction.

机构信息

Department of Electrical and Computer Engineering, Iowa State University, Ames IA 50011, USA.

出版信息

Bioinformatics. 2010 Oct 15;26(20):2526-33. doi: 10.1093/bioinformatics/btq468. Epub 2010 Aug 16.


DOI:10.1093/bioinformatics/btq468
PMID:20834037
Abstract

MOTIVATION: Error correction is critical to the success of next-generation sequencing applications, such as resequencing and de novo genome sequencing. It is especially important for high-throughput short-read sequencing, where reads are much shorter and more abundant, and errors more frequent than in traditional Sanger sequencing. Processing massive numbers of short reads with existing error correction methods is both compute and memory intensive, yet the results are far from satisfactory when applied to real datasets. RESULTS: We present a novel approach, termed Reptile, for error correction in short-read data from next-generation sequencing. Reptile works with the spectrum of k-mers from the input reads, and corrects errors by simultaneously examining: (i) Hamming distance-based correction possibilities for potentially erroneous k-mers; and (ii) neighboring k-mers from the same read for correct contextual information. By not needing to store input data, Reptile has the favorable property that it can handle data that does not fit in main memory. In addition to sequence data, Reptile can make use of available quality score information. Our experiments show that Reptile outperforms previous methods in the percentage of errors removed from the data and the accuracy in true base assignment. In addition, a significant reduction in run time and memory usage have been achieved compared with previous methods, making it more practical for short-read error correction when sampling larger genomes. AVAILABILITY: Reptile is implemented in C++ and is available through the link: http://aluru-sun.ece.iastate.edu/doku.php?id=software CONTACT: aluru@iastate.edu.

摘要

动机:错误纠正对于下一代测序应用程序(如重测序和从头基因组测序)的成功至关重要。对于高通量短读测序来说,这一点尤其重要,因为在这种测序中,读取片段更短、更丰富,而且错误比传统的 Sanger 测序更频繁。使用现有的错误纠正方法处理大量的短读序列既需要大量的计算资源,又需要大量的内存,但是当应用于真实数据集时,结果远不能令人满意。

结果:我们提出了一种新的方法,称为 Reptile,用于纠正下一代测序中短读数据中的错误。Reptile 利用输入读取的 k-mer 谱工作,并通过同时检查以下两种情况来纠正错误:(i)基于汉明距离的潜在错误 k-mer 的纠正可能性;以及(ii)来自同一读取的相邻 k-mer 的正确上下文信息。由于不需要存储输入数据,Reptile 具有一个有利的特性,即它可以处理不在主内存中的数据。除了序列数据之外,Reptile 还可以利用可用的质量评分信息。我们的实验表明,Reptile 在从数据中去除错误的百分比和真碱基赋值的准确性方面都优于以前的方法。此外,与以前的方法相比,运行时间和内存使用量都有显著减少,使得在采样更大的基因组时,短读序列的错误纠正更加实用。

可用性:Reptile 是用 C++ 实现的,可以通过以下链接获得:http://aluru-sun.ece.iastate.edu/doku.php?id=software

联系信息:aluru@iastate.edu.

相似文献

[1]
Reptile: representative tiling for short read error correction.

Bioinformatics. 2010-8-16

[2]
De novo sequencing of plant genomes using second-generation technologies.

Brief Bioinform. 2009-11

[3]
Microindel detection in short-read sequence data.

Bioinformatics. 2010-2-9

[4]
SHREC: a short-read error correction method.

Bioinformatics. 2009-9-1

[5]
Analysis of high-throughput sequencing data.

Methods Mol Biol. 2011

[6]
Correction of sequencing errors in a mixed set of reads.

Bioinformatics. 2010-4-8

[7]
EDAR: an efficient error detection and removal algorithm for next generation sequencing data.

J Comput Biol. 2010-11

[8]
Optimal spliced alignments of short sequence reads.

Bioinformatics. 2008-8-15

[9]
Benchmarking next-generation transcriptome sequencing for functional and evolutionary genomics.

Mol Biol Evol. 2009-8-25

[10]
A parallel algorithm for error correction in high-throughput short-read data on CUDA-enabled graphics hardware.

J Comput Biol. 2010-4

引用本文的文献

[1]
Methods to improve the accuracy of next-generation sequencing.

Front Bioeng Biotechnol. 2023-1-20

[2]
SparkEC: speeding up alignment-based DNA error correction tools.

BMC Bioinformatics. 2022-11-7

[3]
Genome sequence assembly algorithms and misassembly identification methods.

Mol Biol Rep. 2022-11

[4]
K-Mer Spectrum-Based Error Correction Algorithm for Next-Generation Sequencing Data.

Comput Intell Neurosci. 2022

[5]
Lerna: transformer architectures for configuring error correction tools for short- and long-read genome sequencing.

BMC Bioinformatics. 2022-1-6

[6]
A comprehensive evaluation of long read error correction methods.

BMC Genomics. 2020-12-21

[7]
Unique -mers as Strain-Specific Barcodes for Phylogenetic Analysis and Natural Microbiome Profiling.

Int J Mol Sci. 2020-1-31

[8]
A hybrid and scalable error correction algorithm for indel and substitution errors of long reads.

BMC Genomics. 2019-12-20

[9]
Athena: Automated Tuning of k-mer based Genomic Error Correction Algorithms using Language Models.

Sci Rep. 2019-11-6

[10]
Mining statistically-solid k-mers for accurate NGS error correction.

BMC Genomics. 2018-12-31

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索