下一代测序数据计算纠错方法的基准测试。

Benchmarking of computational error-correction methods for next-generation sequencing data.

机构信息

Department of Computer Science, University of California Los Angeles, 404 Westwood Plaza, Los Angeles, CA, 90095, USA.

Department of Clinical Pharmacy, School of Pharmacy, University of Southern California, 1985 Zonal Avenue, Los Angeles, CA, 90089, USA.

出版信息

Genome Biol. 2020 Mar 17;21(1):71. doi: 10.1186/s13059-020-01988-3.

DOI:10.1186/s13059-020-01988-3

PMID:32183840

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7079412/

Abstract

BACKGROUND

Recent advancements in next-generation sequencing have rapidly improved our ability to study genomic material at an unprecedented scale. Despite substantial improvements in sequencing technologies, errors present in the data still risk confounding downstream analysis and limiting the applicability of sequencing technologies in clinical tools. Computational error correction promises to eliminate sequencing errors, but the relative accuracy of error correction algorithms remains unknown.

RESULTS

In this paper, we evaluate the ability of error correction algorithms to fix errors across different types of datasets that contain various levels of heterogeneity. We highlight the advantages and limitations of computational error correction techniques across different domains of biology, including immunogenomics and virology. To demonstrate the efficacy of our technique, we apply the UMI-based high-fidelity sequencing protocol to eliminate sequencing errors from both simulated data and the raw reads. We then perform a realistic evaluation of error-correction methods.

CONCLUSIONS

In terms of accuracy, we find that method performance varies substantially across different types of datasets with no single method performing best on all types of examined data. Finally, we also identify the techniques that offer a good balance between precision and sensitivity.

摘要

背景

新一代测序技术的最新进展迅速提高了我们以空前规模研究基因组材料的能力。尽管测序技术有了实质性的改进，但数据中的错误仍然存在，这可能会混淆下游分析，并限制测序技术在临床工具中的适用性。计算纠错有望消除测序错误，但纠错算法的相对准确性仍不清楚。

结果

在本文中，我们评估了错误纠正算法在包含不同程度异质性的不同类型数据集上纠正错误的能力。我们强调了计算错误纠正技术在免疫基因组学和病毒学等不同生物学领域的优势和局限性。为了展示我们技术的效果，我们应用基于 UMI 的高保真度测序方案从模拟数据和原始读数中消除测序错误。然后，我们对错误纠正方法进行了实际评估。

结论

在准确性方面，我们发现方法性能在不同类型的数据集之间存在很大差异，没有一种方法在所有类型的检查数据上都表现最好。最后，我们还确定了在精度和灵敏度之间提供良好平衡的技术。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0a13/7079412/bcc2916505f4/13059_2020_1988_Fig1_HTML.jpg

相似文献

Benchmarking of computational error-correction methods for next-generation sequencing data.下一代测序数据计算纠错方法的基准测试。

Genome Biol. 2020 Mar 17;21(1):71. doi: 10.1186/s13059-020-01988-3.

A comparative study of k-spectrum-based error correction methods for next-generation sequencing data analysis.基于k谱的下一代测序数据分析纠错方法的比较研究。

Hum Genomics. 2016 Jul 25;10 Suppl 2(Suppl 2):20. doi: 10.1186/s40246-016-0068-0.

A hybrid and scalable error correction algorithm for indel and substitution errors of long reads.一种用于长读段插入/缺失和替换错误的混合可扩展纠错算法。

BMC Genomics. 2019 Dec 20;20(Suppl 11):948. doi: 10.1186/s12864-019-6286-9.

Efficient error correction for next-generation sequencing of viral amplicons.高效的病毒扩增子下一代测序错误校正。

BMC Bioinformatics. 2012 Jun 25;13 Suppl 10(Suppl 10):S6. doi: 10.1186/1471-2105-13-S10-S6.

Denoising DNA deep sequencing data-high-throughput sequencing errors and their correction.去噪DNA深度测序数据——高通量测序错误及其校正

Brief Bioinform. 2016 Jan;17(1):154-79. doi: 10.1093/bib/bbv029. Epub 2015 May 29.

CARE 2.0: reducing false-positive sequencing error corrections using machine learning.CARE 2.0：利用机器学习减少假阳性测序错误纠正。

BMC Bioinformatics. 2022 Jun 13;23(1):227. doi: 10.1186/s12859-022-04754-3.

EC: an efficient error correction algorithm for short reads.EC：一种用于短读段的高效纠错算法。

BMC Bioinformatics. 2015;16 Suppl 17(Suppl 17):S2. doi: 10.1186/1471-2105-16-S17-S2. Epub 2015 Dec 7.

Repeat-aware modeling and correction of short read errors.重复感知建模和短读错误纠正。

BMC Bioinformatics. 2011 Feb 15;12 Suppl 1(Suppl 1):S52. doi: 10.1186/1471-2105-12-S1-S52.

A hybrid correcting method considering heterozygous variations by a comprehensive probabilistic model.一种综合概率模型考虑杂合变异的混合校正方法。

BMC Genomics. 2020 Nov 18;21(Suppl 10):753. doi: 10.1186/s12864-020-07008-9.

A comprehensive evaluation of long read error correction methods.长读错误纠正方法的综合评价。

BMC Genomics. 2020 Dec 21;21(Suppl 6):889. doi: 10.1186/s12864-020-07227-0.

引用本文的文献

A comprehensive workflow for optimizing RNA-seq data analysis.一种优化 RNA-seq 数据分析的综合工作流程。

BMC Genomics. 2024 Jun 24;25(1):631. doi: 10.1186/s12864-024-10414-y.

Measurable Residual Disease Detection in Acute Myeloid Leukemia: Current Challenges and Future Directions.急性髓系白血病中可测量残留病的检测：当前挑战与未来方向

Biomedicines. 2024 Mar 7;12(3):599. doi: 10.3390/biomedicines12030599.

MAC-ErrorReads: machine learning-assisted classifier for filtering erroneous NGS reads.MAC-ErrorReads：一种基于机器学习的分类器，用于过滤错误的 NGS 读取。

BMC Bioinformatics. 2024 Feb 7;25(1):61. doi: 10.1186/s12859-024-05681-1.

Community-scale models of microbiomes: Articulating metabolic modelling and metagenome sequencing.微生物组的社区规模模型：代谢建模和宏基因组测序的阐明。

Microb Biotechnol. 2024 Jan;17(1):e14396. doi: 10.1111/1751-7915.14396. Epub 2024 Jan 20.

Viral oncogenes, viruses, and cancer: a third-generation sequencing perspective on viral integration into the human genome.病毒癌基因、病毒与癌症：关于病毒整合入人类基因组的第三代测序视角

Front Oncol. 2023 Dec 21;13:1333812. doi: 10.3389/fonc.2023.1333812. eCollection 2023.

Amino acid sequence assignment from single molecule peptide sequencing data using a two-stage classifier.基于两阶段分类器的单分子肽测序数据的氨基酸序列赋值。

PLoS Comput Biol. 2023 May 30;19(5):e1011157. doi: 10.1371/journal.pcbi.1011157. eCollection 2023 May.

Benefits of applying molecular barcoding systems are not uniform across different genomic applications.应用分子条码系统的好处在不同的基因组应用中并不统一。

J Transl Med. 2023 May 5;21(1):305. doi: 10.1186/s12967-023-04160-0.

RNA-seq data science: From raw data to effective interpretation.RNA测序数据科学：从原始数据到有效解读

Front Genet. 2023 Mar 13;14:997383. doi: 10.3389/fgene.2023.997383. eCollection 2023.

P-smoother: efficient PBWT smoothing of large haplotype panels.P-平滑器：对大型单倍型面板进行高效的基于位置的小波变换平滑处理

Bioinform Adv. 2022 Jun 20;2(1):vbac045. doi: 10.1093/bioadv/vbac045. eCollection 2022.

A comparative analysis of single cell small RNA sequencing data reveals heterogeneous isomiR expression and regulation.单细胞小RNA测序数据的比较分析揭示了异质性的异构体miRNA表达与调控。

Sci Rep. 2022 Feb 18;12(1):2834. doi: 10.1038/s41598-022-06876-3.

本文引用的文献

Profiling immunoglobulin repertoires across multiple human tissues using RNA sequencing.利用 RNA 测序技术在多种人体组织中进行免疫球蛋白库分析。

Nat Commun. 2020 Jun 19;11(1):3126. doi: 10.1038/s41467-020-16857-7.

Systematic benchmarking of omics computational tools.系统生物学计算工具的基准测试。

Nat Commun. 2019 Mar 27;10(1):1393. doi: 10.1038/s41467-019-09406-4.

Analysis of error profiles in deep next-generation sequencing data.深度下一代测序数据中的错误分析。

Genome Biol. 2019 Mar 14;20(1):50. doi: 10.1186/s13059-019-1659-6.

Enhancing the accuracy of next-generation sequencing for detecting rare and subclonal mutations.提高下一代测序检测稀有和亚克隆突变的准确性。

Nat Rev Genet. 2018 May;19(5):269-285. doi: 10.1038/nrg.2017.117. Epub 2018 Mar 26.

Highly accurate fluorogenic DNA sequencing with information theory-based error correction.基于信息论的纠错技术实现高度精确的荧光 DNA 测序。

Nat Biotechnol. 2017 Dec;35(12):1170-1178. doi: 10.1038/nbt.3982. Epub 2017 Nov 6.

Evaluation of the impact of Illumina error correction tools on de novo genome assembly.评估Illumina纠错工具对从头基因组组装的影响。

BMC Bioinformatics. 2017 Aug 18;18(1):374. doi: 10.1186/s12859-017-1784-8.

Tracking the evolution of 3D gene organization demonstrates its connection to phenotypic divergence.追踪三维基因组织的进化过程表明了它与表型差异的联系。

Nucleic Acids Res. 2017 May 5;45(8):4330-4343. doi: 10.1093/nar/gkx205.

RECKONER: read error corrector based on KMC.RECKONER：基于 KMC 的读错误校正器。

Bioinformatics. 2017 Apr 1;33(7):1086-1089. doi: 10.1093/bioinformatics/btw746.

Chromatin accessibility contributes to simultaneous mutations of cancer genes.染色质可及性促成癌症基因的同时突变。

Sci Rep. 2016 Oct 20;6:35270. doi: 10.1038/srep35270.

Current practices and guidelines for clinical next-generation sequencing oncology testing.临床下一代测序肿瘤检测的当前实践与指南

Cancer Biol Med. 2016 Mar;13(1):3-11. doi: 10.28092/j.issn.2095-3941.2016.0004.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

下一代测序数据计算纠错方法的基准测试。

Benchmarking of computational error-correction methods for next-generation sequencing data.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献