Suppr超能文献

深度下一代测序数据中的错误分析。

Analysis of error profiles in deep next-generation sequencing data.

机构信息

Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA.

Department of Pathology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA.

出版信息

Genome Biol. 2019 Mar 14;20(1):50. doi: 10.1186/s13059-019-1659-6.

Abstract

BACKGROUND

Sequencing errors are key confounding factors for detecting low-frequency genetic variants that are important for cancer molecular diagnosis, treatment, and surveillance using deep next-generation sequencing (NGS). However, there is a lack of comprehensive understanding of errors introduced at various steps of a conventional NGS workflow, such as sample handling, library preparation, PCR enrichment, and sequencing. In this study, we use current NGS technology to systematically investigate these questions.

RESULTS

By evaluating read-specific error distributions, we discover that the substitution error rate can be computationally suppressed to 10 to 10, which is 10- to 100-fold lower than generally considered achievable (10) in the current literature. We then quantify substitution errors attributable to sample handling, library preparation, enrichment PCR, and sequencing by using multiple deep sequencing datasets. We find that error rates differ by nucleotide substitution types, ranging from 10 for A>C/T>G, C>A/G>T, and C>G/G>C changes to 10 for A>G/T>C changes. Furthermore, C>T/G>A errors exhibit strong sequence context dependency, sample-specific effects dominate elevated C>A/G>T errors, and target-enrichment PCR led to ~ 6-fold increase of overall error rate. We also find that more than 70% of hotspot variants can be detected at 0.1 ~ 0.01% frequency with the current NGS technology by applying in silico error suppression.

CONCLUSIONS

We present the first comprehensive analysis of sequencing error sources in conventional NGS workflows. The error profiles revealed by our study highlight new directions for further improving NGS analysis accuracy both experimentally and computationally, ultimately enhancing the precision of deep sequencing.

摘要

背景

测序错误是检测低频遗传变异的关键混杂因素,这些低频遗传变异对癌症的分子诊断、治疗和监测至关重要,而深度下一代测序(NGS)则可用于检测这些变异。然而,目前对于传统 NGS 工作流程各个步骤(如样本处理、文库制备、PCR 富集和测序)引入的错误缺乏全面的了解。在本研究中,我们使用当前的 NGS 技术系统地研究了这些问题。

结果

通过评估读段特异性错误分布,我们发现替代错误率可以通过计算方法降低到 10 到 10,比当前文献中普遍认为的 10 到 100 倍(10)要低得多。然后,我们使用多个深度测序数据集来量化样本处理、文库制备、富集 PCR 和测序过程中引入的替代错误。我们发现,错误率因核苷酸替代类型而异,从 A>C/T>G、C>A/G>T 和 C>G/G>C 变化的 10 到 A>G/T>C 变化的 10。此外,C>T/G>A 错误表现出强烈的序列上下文依赖性,样本特异性效应主导着 C>A/G>T 错误的升高,而靶向富集 PCR 导致整体错误率增加了约 6 倍。我们还发现,通过应用计算机模拟错误抑制,当前的 NGS 技术可以在 0.1 至 0.01%的频率下检测到超过 70%的热点变异。

结论

我们首次全面分析了传统 NGS 工作流程中的测序错误来源。我们的研究揭示的错误分布为进一步提高 NGS 分析的准确性提供了新的方向,无论是在实验上还是在计算上,最终都能提高深度测序的精度。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4601/6417284/a20ce1f690af/13059_2019_1659_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验