Suppr超能文献

PCR 诱导的转换是超深度焦磷酸测序数据清洗后主要的错误来源。

PCR-induced transitions are the major source of error in cleaned ultra-deep pyrosequencing data.

机构信息

Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet, Stockholm, Sweden.

出版信息

PLoS One. 2013 Jul 23;8(7):e70388. doi: 10.1371/journal.pone.0070388. Print 2013.

Abstract

BACKGROUND

Ultra-deep pyrosequencing (UDPS) is used to identify rare sequence variants. The sequence depth is influenced by several factors including the error frequency of PCR and UDPS. This study investigated the characteristics and source of errors in raw and cleaned UDPS data.

RESULTS

UDPS of a 167-nucleotide fragment of the HIV-1 SG3Δenv plasmid was performed on the Roche/454 platform. The plasmid was diluted to one copy, PCR amplified and subjected to bidirectional UDPS on three occasions. The dataset consisted of 47,693 UDPS reads. Raw UDPS data had an average error frequency of 0.30% per nucleotide site. Most errors were insertions and deletions in homopolymeric regions. We used a cleaning strategy that removed almost all indel errors, but had little effect on substitution errors, which reduced the error frequency to 0.056% per nucleotide. In cleaned data the error frequency was similar in homopolymeric and non-homopolymeric regions, but varied considerably across sites. These site-specific error frequencies were moderately, but still significantly, correlated between runs (r=0.15-0.65) and between forward and reverse sequencing directions within runs (r=0.33-0.65). Furthermore, transition errors were 48-times more common than transversion errors (0.052% vs. 0.001%; p<0.0001). Collectively the results indicate that a considerable proportion of the sequencing errors that remained after data cleaning were generated during the PCR that preceded UDPS.

CONCLUSIONS

A majority of the sequencing errors that remained after data cleaning were introduced by PCR prior to sequencing, which means that they will be independent of platform used for next-generation sequencing. The transition vs. transversion error bias in cleaned UDPS data will influence the detection limits of rare mutations and sequence variants.

摘要

背景

超深度焦磷酸测序(UDPS)用于鉴定罕见的序列变异。序列深度受多种因素影响,包括 PCR 和 UDPS 的错误频率。本研究调查了原始和清理后的 UDPS 数据中的错误特征和来源。

结果

在 Roche/454 平台上对 HIV-1 SG3Δenv 质粒的 167 个核苷酸片段进行 UDPS。将质粒稀释至一个拷贝,进行 PCR 扩增,并在三个时间点进行双向 UDPS。数据集由 47693 个 UDPS 读取组成。原始 UDPS 数据的平均错误频率为每个核苷酸位置的 0.30%。大多数错误是在同聚核苷酸区域的插入和缺失。我们使用了一种清理策略,几乎去除了所有的插入缺失错误,但对替换错误几乎没有影响,将错误频率降低到每个核苷酸的 0.056%。在清理数据中,同聚核苷酸和非同聚核苷酸区域的错误频率相似,但在各个位置变化很大。这些位置特异性错误频率在运行之间(r=0.15-0.65)和运行内的正向和反向测序方向之间(r=0.33-0.65)中度但仍然显著相关。此外,转换错误比颠换错误常见 48 倍(0.052%对 0.001%;p<0.0001)。总的来说,这些结果表明,在 UDPS 之前的 PCR 过程中产生了大量在数据清理后仍然存在的测序错误。

结论

在数据清理后仍然存在的大多数测序错误是在测序之前的 PCR 中引入的,这意味着它们将独立于用于下一代测序的平台。在清理后的 UDPS 数据中,转换错误与颠换错误的错误偏向将影响稀有突变和序列变异的检测极限。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/63ac/3720931/ffd6892d72aa/pone.0070388.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验