• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

ADEPT,一个带有序列修剪功能的动态新一代测序数据错误检测程序。

ADEPT, a dynamic next generation sequencing data error-detection program with trimming.

作者信息

Feng Shihai, Lo Chien-Chi, Li Po-E, Chain Patrick S G

机构信息

Genome Science Group, Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM, 87545, USA.

出版信息

BMC Bioinformatics. 2016 Feb 29;17:109. doi: 10.1186/s12859-016-0967-z.

DOI:10.1186/s12859-016-0967-z
PMID:26928302
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4772517/
Abstract

BACKGROUND

Illumina is the most widely used next generation sequencing technology and produces millions of short reads that contain errors. These sequencing errors constitute a major problem in applications such as de novo genome assembly, metagenomics analysis and single nucleotide polymorphism discovery.

RESULTS

In this study, we present ADEPT, a dynamic error detection method, based on the quality scores of each nucleotide and its neighboring nucleotides, together with their positions within the read and compares this to the position-specific quality score distribution of all bases within the sequencing run. This method greatly improves upon other available methods in terms of the true positive rate of error discovery without affecting the false positive rate, particularly within the middle of reads.

CONCLUSIONS

ADEPT is the only tool to date that dynamically assesses errors within reads by comparing position-specific and neighboring base quality scores with the distribution of quality scores for the dataset being analyzed. The result is a method that is less prone to position-dependent under-prediction, which is one of the most prominent issues in error prediction. The outcome is that ADEPT improves upon prior efforts in identifying true errors, primarily within the middle of reads, while reducing the false positive rate.

摘要

背景

Illumina是应用最广泛的新一代测序技术,能产生数百万条包含错误的短读段。这些测序错误在诸如从头基因组组装、宏基因组学分析和单核苷酸多态性发现等应用中构成了一个主要问题。

结果

在本研究中,我们提出了ADEPT,一种动态错误检测方法,它基于每个核苷酸及其相邻核苷酸的质量得分,以及它们在读取片段中的位置,并将其与测序运行中所有碱基的位置特异性质量得分分布进行比较。该方法在错误发现的真阳性率方面比其他现有方法有很大改进,同时不影响假阳性率,特别是在读取片段的中间部分。

结论

ADEPT是迄今为止唯一一种通过将位置特异性和相邻碱基质量得分与所分析数据集的质量得分分布进行比较来动态评估读取片段内错误的工具。结果是一种不太容易出现位置依赖性预测不足的方法,而位置依赖性预测不足是错误预测中最突出的问题之一。结果是ADEPT在识别真正错误方面比之前的方法有所改进,主要是在读取片段的中间部分,同时降低了假阳性率。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/860b/4772517/2ab557560c67/12859_2016_967_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/860b/4772517/fd29c4d7a654/12859_2016_967_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/860b/4772517/25f6e4d411c7/12859_2016_967_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/860b/4772517/394c92c37422/12859_2016_967_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/860b/4772517/2ab557560c67/12859_2016_967_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/860b/4772517/fd29c4d7a654/12859_2016_967_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/860b/4772517/25f6e4d411c7/12859_2016_967_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/860b/4772517/394c92c37422/12859_2016_967_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/860b/4772517/2ab557560c67/12859_2016_967_Fig4_HTML.jpg

相似文献

1
ADEPT, a dynamic next generation sequencing data error-detection program with trimming.ADEPT,一个带有序列修剪功能的动态新一代测序数据错误检测程序。
BMC Bioinformatics. 2016 Feb 29;17:109. doi: 10.1186/s12859-016-0967-z.
2
Improving the sensitivity of long read overlap detection using grouped short k-mer matches.利用分组短 k-mer 匹配提高长读重叠检测的灵敏度。
BMC Genomics. 2019 Apr 4;20(Suppl 2):190. doi: 10.1186/s12864-019-5475-x.
3
QuorUM: An Error Corrector for Illumina Reads.QuorUM:Illumina测序读数的纠错工具
PLoS One. 2015 Jun 17;10(6):e0130821. doi: 10.1371/journal.pone.0130821. eCollection 2015.
4
Subset selection of high-depth next generation sequencing reads for de novo genome assembly using MapReduce framework.使用MapReduce框架进行从头基因组组装时对高深度下一代测序读数的子集选择。
BMC Genomics. 2015;16 Suppl 12(Suppl 12):S9. doi: 10.1186/1471-2164-16-S12-S9. Epub 2015 Dec 9.
5
EC: an efficient error correction algorithm for short reads.EC:一种用于短读段的高效纠错算法。
BMC Bioinformatics. 2015;16 Suppl 17(Suppl 17):S2. doi: 10.1186/1471-2105-16-S17-S2. Epub 2015 Dec 7.
6
Pollux: platform independent error correction of single and mixed genomes.Pollux:单基因组和混合基因组的平台无关错误校正
BMC Bioinformatics. 2015 Jan 16;16(1):10. doi: 10.1186/s12859-014-0435-6.
7
Masking as an effective quality control method for next-generation sequencing data analysis.掩蔽作为下一代测序数据分析的一种有效质量控制方法。
BMC Bioinformatics. 2014 Dec 13;15(1):382. doi: 10.1186/s12859-014-0382-2.
8
Illumina error profiles: resolving fine-scale variation in metagenomic sequencing data.Illumina错误概况:解析宏基因组测序数据中的精细尺度变异
BMC Bioinformatics. 2016 Mar 11;17:125. doi: 10.1186/s12859-016-0976-y.
9
Improvement in detection of minor alleles in next generation sequencing by base quality recalibration.通过碱基质量重新校准提高下一代测序中稀有等位基因的检测能力。
BMC Genomics. 2016 Feb 27;17:139. doi: 10.1186/s12864-016-2463-2.
10
Rapid evaluation and quality control of next generation sequencing data with FaQCs.使用FaQCs对下一代测序数据进行快速评估和质量控制。
BMC Bioinformatics. 2014 Nov 19;15(1):366. doi: 10.1186/s12859-014-0366-2.

引用本文的文献

1
Preserving Missing Data Distribution in Synthetic Data.在合成数据中保留缺失数据分布
Proc Int World Wide Web Conf. 2023 Apr-May;2023:2110-2121. doi: 10.1145/3543507.3583297. Epub 2023 Apr 30.
2
A random forest classifier for detecting rare variants in NGS data from viral populations.一种用于检测病毒群体NGS数据中罕见变异的随机森林分类器。
Comput Struct Biotechnol J. 2017 Jul 19;15:388-395. doi: 10.1016/j.csbj.2017.07.001. eCollection 2017.

本文引用的文献

1
Rapid evaluation and quality control of next generation sequencing data with FaQCs.使用FaQCs对下一代测序数据进行快速评估和质量控制。
BMC Bioinformatics. 2014 Nov 19;15(1):366. doi: 10.1186/s12859-014-0366-2.
2
A survey of error-correction methods for next-generation sequencing.下一代测序错误纠正方法综述。
Brief Bioinform. 2013 Jan;14(1):56-66. doi: 10.1093/bib/bbs015. Epub 2012 Apr 6.
3
ConDeTri--a content dependent read trimmer for Illumina data.ConDeTri——一个用于 Illumina 数据的基于内容的读修剪器。
PLoS One. 2011;6(10):e26314. doi: 10.1371/journal.pone.0026314. Epub 2011 Oct 19.
4
HiTEC: accurate error correction in high-throughput sequencing data.HiTEC:高通量测序数据中的精确错误校正。
Bioinformatics. 2011 Feb 1;27(3):295-302. doi: 10.1093/bioinformatics/btq653. Epub 2010 Nov 26.
5
Quake: quality-aware detection and correction of sequencing errors.Quake:测序错误的质量感知检测和校正。
Genome Biol. 2010;11(11):R116. doi: 10.1186/gb-2010-11-11-r116. Epub 2010 Nov 29.
6
SolexaQA: At-a-glance quality assessment of Illumina second-generation sequencing data.SolexaQA:快速评估 Illumina 第二代测序数据的质量。
BMC Bioinformatics. 2010 Sep 27;11:485. doi: 10.1186/1471-2105-11-485.
7
Correction of sequencing errors in a mixed set of reads.纠正混合读取集中的测序错误。
Bioinformatics. 2010 May 15;26(10):1284-90. doi: 10.1093/bioinformatics/btq151. Epub 2010 Apr 8.
8
Sequencing technologies - the next generation.测序技术——下一代。
Nat Rev Genet. 2010 Jan;11(1):31-46. doi: 10.1038/nrg2626. Epub 2009 Dec 8.
9
SHREC: a short-read error correction method.SHREC:一种短读长错误校正方法。
Bioinformatics. 2009 Sep 1;25(17):2157-63. doi: 10.1093/bioinformatics/btp379. Epub 2009 Jun 19.
10
Fast and accurate short read alignment with Burrows-Wheeler transform.使用Burrows-Wheeler变换进行快速准确的短读比对。
Bioinformatics. 2009 Jul 15;25(14):1754-60. doi: 10.1093/bioinformatics/btp324. Epub 2009 May 18.