• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用非均匀覆盖的高通量测序数据集进行纠错。

Error correction of high-throughput sequencing datasets with non-uniform coverage.

机构信息

Department of Computer Science and Engineering, University of California, San Diego, CA, USA.

出版信息

Bioinformatics. 2011 Jul 1;27(13):i137-41. doi: 10.1093/bioinformatics/btr208.

DOI:10.1093/bioinformatics/btr208
PMID:21685062
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3117386/
Abstract

MOTIVATION

The continuing improvements to high-throughput sequencing (HTS) platforms have begun to unfold a myriad of new applications. As a result, error correction of sequencing reads remains an important problem. Though several tools do an excellent job of correcting datasets where the reads are sampled close to uniformly, the problem of correcting reads coming from drastically non-uniform datasets, such as those from single-cell sequencing, remains open.

RESULTS

In this article, we develop the method Hammer for error correction without any uniformity assumptions. Hammer is based on a combination of a Hamming graph and a simple probabilistic model for sequencing errors. It is a simple and adaptable algorithm that improves on other tools on non-uniform single-cell data, while achieving comparable results on normal multi-cell data.

AVAILABILITY

http://www.cs.toronto.edu/~pashadag.

CONTACT

pmedvedev@cs.ucsd.edu.

摘要

动机

高通量测序(HTS)平台的不断改进已经开始带来无数新的应用。因此,测序读段的纠错仍然是一个重要的问题。尽管有几个工具在处理读段采样接近均匀的数据集时表现出色,但从极不均匀的数据集(如单细胞测序)中纠正读段的问题仍然存在。

结果

在本文中,我们开发了一种无需任何均匀性假设的纠错方法 Hammer。Hammer 基于汉明图和测序错误的简单概率模型的组合。它是一种简单且适应性强的算法,在非均匀的单细胞数据上优于其他工具,同时在正常的多细胞数据上取得可比的结果。

可用性

http://www.cs.toronto.edu/~pashadag.

联系方式

pmedvedev@cs.ucsd.edu.

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0cf3/3117386/7fae70697701/btr208f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0cf3/3117386/d4c2216fb272/btr208f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0cf3/3117386/f849371557cf/btr208f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0cf3/3117386/d4ebb162529f/btr208f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0cf3/3117386/7fae70697701/btr208f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0cf3/3117386/d4c2216fb272/btr208f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0cf3/3117386/f849371557cf/btr208f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0cf3/3117386/d4ebb162529f/btr208f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0cf3/3117386/7fae70697701/btr208f4.jpg

相似文献

1
Error correction of high-throughput sequencing datasets with non-uniform coverage.利用非均匀覆盖的高通量测序数据集进行纠错。
Bioinformatics. 2011 Jul 1;27(13):i137-41. doi: 10.1093/bioinformatics/btr208.
2
In search of perfect reads.寻找完美的读数。
BMC Bioinformatics. 2015;16 Suppl 17(Suppl 17):S7. doi: 10.1186/1471-2105-16-S17-S7. Epub 2015 Dec 7.
3
LoRDEC: accurate and efficient long read error correction.LoRDEC:准确高效的长读错误纠正。
Bioinformatics. 2014 Dec 15;30(24):3506-14. doi: 10.1093/bioinformatics/btu538. Epub 2014 Aug 26.
4
BayesHammer: Bayesian clustering for error correction in single-cell sequencing.BayesHammer:单细胞测序中用于纠错的贝叶斯聚类。
BMC Genomics. 2013;14 Suppl 1(Suppl 1):S7. doi: 10.1186/1471-2164-14-S1-S7. Epub 2013 Jan 21.
5
Repeat-aware modeling and correction of short read errors.重复感知建模和短读错误纠正。
BMC Bioinformatics. 2011 Feb 15;12 Suppl 1(Suppl 1):S52. doi: 10.1186/1471-2105-12-S1-S52.
6
QuorUM: An Error Corrector for Illumina Reads.QuorUM:Illumina测序读数的纠错工具
PLoS One. 2015 Jun 17;10(6):e0130821. doi: 10.1371/journal.pone.0130821. eCollection 2015.
7
EC: an efficient error correction algorithm for short reads.EC:一种用于短读段的高效纠错算法。
BMC Bioinformatics. 2015;16 Suppl 17(Suppl 17):S2. doi: 10.1186/1471-2105-16-S17-S2. Epub 2015 Dec 7.
8
Correction of sequencing errors in a mixed set of reads.纠正混合读取集中的测序错误。
Bioinformatics. 2010 May 15;26(10):1284-90. doi: 10.1093/bioinformatics/btq151. Epub 2010 Apr 8.
9
BLESS: bloom filter-based error correction solution for high-throughput sequencing reads.BLESS:基于布隆过滤器的高通量测序读错误纠正解决方案。
Bioinformatics. 2014 May 15;30(10):1354-62. doi: 10.1093/bioinformatics/btu030. Epub 2014 Jan 21.
10
IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth.IDBA-UD:一个用于具有高度不均匀深度的单细胞和宏基因组测序数据的从头组装程序。
Bioinformatics. 2012 Jun 1;28(11):1420-8. doi: 10.1093/bioinformatics/bts174. Epub 2012 Apr 11.

引用本文的文献

1
Swiftly identifying strongly unique k-mers.快速识别高度独特的k-mer序列。
Algorithms Mol Biol. 2025 Jul 13;20(1):13. doi: 10.1186/s13015-025-00286-6.
2
Methods to improve the accuracy of next-generation sequencing.提高下一代测序准确性的方法。
Front Bioeng Biotechnol. 2023 Jan 20;11:982111. doi: 10.3389/fbioe.2023.982111. eCollection 2023.
3
Genome sequence assembly algorithms and misassembly identification methods.基因组序列组装算法和错误组装识别方法。

本文引用的文献

1
HiTEC: accurate error correction in high-throughput sequencing data.HiTEC:高通量测序数据中的精确错误校正。
Bioinformatics. 2011 Feb 1;27(3):295-302. doi: 10.1093/bioinformatics/btq653. Epub 2010 Nov 26.
2
Quake: quality-aware detection and correction of sequencing errors.Quake:测序错误的质量感知检测和校正。
Genome Biol. 2010;11(11):R116. doi: 10.1186/gb-2010-11-11-r116. Epub 2010 Nov 29.
3
EDAR: an efficient error detection and removal algorithm for next generation sequencing data.EDAR:一种用于下一代测序数据的高效错误检测与去除算法。
Mol Biol Rep. 2022 Nov;49(11):11133-11148. doi: 10.1007/s11033-022-07919-8. Epub 2022 Sep 23.
4
Convex hulls in hamming space enable efficient search for similarity and clustering of genomic sequences.哈明空间中的凸包可实现对基因组序列的相似性和聚类的高效搜索。
BMC Bioinformatics. 2020 Dec 30;21(Suppl 18):482. doi: 10.1186/s12859-020-03811-z.
5
A hybrid and scalable error correction algorithm for indel and substitution errors of long reads.一种用于长读段插入/缺失和替换错误的混合可扩展纠错算法。
BMC Genomics. 2019 Dec 20;20(Suppl 11):948. doi: 10.1186/s12864-019-6286-9.
6
Estimating the -mer Coverage Frequencies in Genomic Datasets: A Comparative Assessment of the State-of-the-art.估算基因组数据集中的-mer覆盖频率:对当前技术水平的比较评估。
Curr Genomics. 2019 Jan;20(1):2-15. doi: 10.2174/1389202919666181026101326.
7
Hierarchical Clustering of DNA k-mer Counts in RNAseq Fastq Files Identifies Sample Heterogeneities.RNAseq Fastq 文件中 DNA k-mer 计数的层次聚类可识别样本异质性。
Int J Mol Sci. 2018 Nov 21;19(11):3687. doi: 10.3390/ijms19113687.
8
A benchmark study of k-mer counting methods for high-throughput sequencing.用于高通量测序的 k-mer 计数方法的基准研究。
Gigascience. 2018 Dec 1;7(12):giy125. doi: 10.1093/gigascience/giy125.
9
Fast estimation of genetic relatedness between members of heterogeneous populations of closely related genomic variants.快速估计亲缘关系密切的基因组变异体异质群体成员之间的遗传关系。
BMC Bioinformatics. 2018 Oct 22;19(Suppl 11):360. doi: 10.1186/s12859-018-2333-9.
10
Reconstructing Antibody Repertoires from Error-Prone Immunosequencing Reads.从易出错的免疫测序读数中重建抗体库
J Immunol. 2017 Nov 1;199(9):3369-3380. doi: 10.4049/jimmunol.1700485. Epub 2017 Oct 4.
J Comput Biol. 2010 Nov;17(11):1549-60. doi: 10.1089/cmb.2010.0127. Epub 2010 Oct 25.
4
Reptile: representative tiling for short read error correction.爬行动物:简称短读错误纠正的代表性平铺。
Bioinformatics. 2010 Oct 15;26(20):2526-33. doi: 10.1093/bioinformatics/btq468. Epub 2010 Aug 16.
5
Modeling non-uniformity in short-read rates in RNA-Seq data.RNA-Seq 数据中短读率非均匀性建模。
Genome Biol. 2010;11(5):R50. doi: 10.1186/gb-2010-11-5-r50. Epub 2010 May 11.
6
A parallel algorithm for error correction in high-throughput short-read data on CUDA-enabled graphics hardware.一种用于在支持CUDA的图形硬件上对高通量短读数据进行纠错的并行算法。
J Comput Biol. 2010 Apr;17(4):603-15. doi: 10.1089/cmb.2009.0062.
7
Correction of sequencing errors in a mixed set of reads.纠正混合读取集中的测序错误。
Bioinformatics. 2010 May 15;26(10):1284-90. doi: 10.1093/bioinformatics/btq151. Epub 2010 Apr 8.
8
Recount: expectation maximization based error correction tool for next generation sequencing data.叙述:基于期望最大化的新一代测序数据纠错工具。
Genome Inform. 2009 Oct;23(1):189-201.
9
Genome 10K: a proposal to obtain whole-genome sequence for 10,000 vertebrate species.基因组 10K:获取 10000 种脊椎动物全基因组序列的提案。
J Hered. 2009 Nov-Dec;100(6):659-74. doi: 10.1093/jhered/esp086. Epub 2009 Nov 5.
10
Whole genome amplification and de novo assembly of single bacterial cells.单细胞全基因组扩增与从头组装。
PLoS One. 2009 Sep 2;4(9):e6864. doi: 10.1371/journal.pone.0006864.