引物ID验证模板采样深度并大幅降低HIV-1基因组RNA群体下一代测序的错误率。

Primer ID Validates Template Sampling Depth and Greatly Reduces the Error Rate of Next-Generation Sequencing of HIV-1 Genomic RNA Populations.

作者信息

Zhou Shuntai, Jones Corbin, Mieczkowski Piotr, Swanstrom Ronald

机构信息

UNC Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA.

Department of Biology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA Carolina Center for Genome Sciences, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA.

出版信息

J Virol. 2015 Aug;89(16):8540-55. doi: 10.1128/JVI.00522-15. Epub 2015 Jun 3.

DOI:10.1128/JVI.00522-15

PMID:26041299

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4524263/

Abstract

UNLABELLED

Validating the sampling depth and reducing sequencing errors are critical for studies of viral populations using next-generation sequencing (NGS). We previously described the use of Primer ID to tag each viral RNA template with a block of degenerate nucleotides in the cDNA primer. We now show that low-abundance Primer IDs (offspring Primer IDs) are generated due to PCR/sequencing errors. These artifactual Primer IDs can be removed using a cutoff model for the number of reads required to make a template consensus sequence. We have modeled the fraction of sequences lost due to Primer ID resampling. For a typical sequencing run, less than 10% of the raw reads are lost to offspring Primer ID filtering and resampling. The remaining raw reads are used to correct for PCR resampling and sequencing errors. We also demonstrate that Primer ID reveals bias intrinsic to PCR, especially at low template input or utilization. cDNA synthesis and PCR convert ca. 20% of RNA templates into recoverable sequences, and 30-fold sequence coverage recovers most of these template sequences. We have directly measured the residual error rate to be around 1 in 10,000 nucleotides. We use this error rate and the Poisson distribution to define the cutoff to identify preexisting drug resistance mutations at low abundance in an HIV-infected subject. Collectively, these studies show that >90% of the raw sequence reads can be used to validate template sampling depth and to dramatically reduce the error rate in assessing a genetically diverse viral population using NGS.

IMPORTANCE

Although next-generation sequencing (NGS) has revolutionized sequencing strategies, it suffers from serious limitations in defining sequence heterogeneity in a genetically diverse population, such as HIV-1 due to PCR resampling and PCR/sequencing errors. The Primer ID approach reveals the true sampling depth and greatly reduces errors. Knowing the sampling depth allows the construction of a model of how to maximize the recovery of sequences from input templates and to reduce resampling of the Primer ID so that appropriate multiplexing can be included in the experimental design. With the defined sampling depth and measured error rate, we are able to assign cutoffs for the accurate detection of minority variants in viral populations. This approach allows the power of NGS to be realized without having to guess about sampling depth or to ignore the problem of PCR resampling, while also being able to correct most of the errors in the data set.

摘要

未标记

验证采样深度和减少测序错误对于使用下一代测序（NGS）研究病毒群体至关重要。我们之前描述了使用引物ID在cDNA引物中用一段简并核苷酸标记每个病毒RNA模板。我们现在表明，由于PCR/测序错误会产生低丰度的引物ID（子代引物ID）。这些人为产生的引物ID可以使用一个截止模型去除，该模型用于确定生成模板一致序列所需的读数数量。我们已经对由于引物ID重新采样而丢失的序列比例进行了建模。对于一次典型的测序运行，不到10%的原始读数会因子代引物ID过滤和重新采样而丢失。其余的原始读数用于校正PCR重新采样和测序错误。我们还证明，引物ID揭示了PCR固有的偏差，尤其是在低模板输入或利用率的情况下。cDNA合成和PCR可将约20%的RNA模板转化为可回收序列，30倍的序列覆盖度可回收大多数这些模板序列。我们直接测量的残留错误率约为每10000个核苷酸中有1个错误。我们使用这个错误率和泊松分布来定义截止值，以识别HIV感染个体中低丰度的预先存在的耐药性突变。总体而言，这些研究表明，超过90%的原始序列读数可用于验证模板采样深度，并显著降低使用NGS评估基因多样化病毒群体时的错误率。

重要性

尽管下一代测序（NGS）彻底改变了测序策略，但由于PCR重新采样和PCR/测序错误，在定义基因多样化群体（如HIV-1）中的序列异质性方面存在严重局限性。引物ID方法揭示了真实的采样深度并大大减少了错误。了解采样深度有助于构建一个模型，该模型用于说明如何最大限度地从输入模板中回收序列，并减少引物ID的重新采样，以便在实验设计中纳入适当的多重分析。有了定义的采样深度和测量的错误率，我们能够为准确检测病毒群体中的少数变异体设定截止值。这种方法能够充分发挥NGS的能力，而无需猜测采样深度或忽略PCR重新采样问题，同时还能够校正数据集中的大多数错误。

相似文献

Primer ID Validates Template Sampling Depth and Greatly Reduces the Error Rate of Next-Generation Sequencing of HIV-1 Genomic RNA Populations.引物ID验证模板采样深度并大幅降低HIV-1基因组RNA群体下一代测序的错误率。

J Virol. 2015 Aug;89(16):8540-55. doi: 10.1128/JVI.00522-15. Epub 2015 Jun 3.

Primer ID Informs Next-Generation Sequencing Platforms and Reveals Preexisting Drug Resistance Mutations in the HIV-1 Reverse Transcriptase Coding Domain.引物ID为下一代测序平台提供信息，并揭示HIV-1逆转录酶编码域中预先存在的耐药性突变。

AIDS Res Hum Retroviruses. 2015 Jun;31(6):658-68. doi: 10.1089/AID.2014.0031. Epub 2015 Apr 2.

Ultrasensitive single-genome sequencing: accurate, targeted, next generation sequencing of HIV-1 RNA.超灵敏单基因组测序：HIV-1 RNA的准确、靶向新一代测序

Retrovirology. 2016 Dec 20;13(1):87. doi: 10.1186/s12977-016-0321-6.

Challenges with using primer IDs to improve accuracy of next generation sequencing.使用引物ID提高下一代测序准确性所面临的挑战。

PLoS One. 2015 Mar 5;10(3):e0119123. doi: 10.1371/journal.pone.0119123. eCollection 2015.

Accurate sampling and deep sequencing of the HIV-1 protease gene using a Primer ID.使用引物 ID 对 HIV-1 蛋白酶基因进行准确的取样和深度测序。

Proc Natl Acad Sci U S A. 2011 Dec 13;108(50):20166-71. doi: 10.1073/pnas.1110064108. Epub 2011 Nov 30.

Fact and Fiction about 1%: Next Generation Sequencing and the Detection of Minor Drug Resistant Variants in HIV-1 Populations with and without Unique Molecular Identifiers.关于 1%的事实与虚构：下一代测序技术以及有无独特分子标识符的 HIV-1 人群中小部分耐药变异体的检测。

Viruses. 2020 Aug 4;12(8):850. doi: 10.3390/v12080850.

Unique Molecular Identifiers and Multiplexing Amplicons Maximize the Utility of Deep Sequencing To Critically Assess Population Diversity in RNA Viruses.独特分子标识符和多重扩增子最大限度地提高了深度测序在批判性评估 RNA 病毒群体多样性方面的效用。

ACS Infect Dis. 2022 Dec 9;8(12):2505-2514. doi: 10.1021/acsinfecdis.2c00319. Epub 2022 Nov 3.

A Comprehensive Analysis of Primer IDs to Study Heterogeneous HIV-1 Populations.用于研究异质性HIV-1群体的引物ID综合分析

J Mol Biol. 2016 Jan 16;428(1):238-250. doi: 10.1016/j.jmb.2015.12.012. Epub 2015 Dec 19.

Universal and blocking primer mismatches limit the use of high-throughput DNA sequencing for the quantitative metabarcoding of arthropods.通用引物和阻断引物错配限制了高通量DNA测序在节肢动物定量代谢组学中的应用。

Mol Ecol Resour. 2015 Jul;15(4):819-30. doi: 10.1111/1755-0998.12355. Epub 2014 Dec 23.

Primer ID Next-Generation Sequencing for the Analysis of a Broad Spectrum Antiviral Induced Transition Mutations and Errors Rates in a Coronavirus Genome.用于分析广谱抗病毒药物诱导的冠状病毒基因组转换突变和错误率的引物ID下一代测序

Bio Protoc. 2021 Mar 5;11(5):e3938. doi: 10.21769/BioProtoc.3938.

引用本文的文献

Patterns of inflammation and immune activation by coreceptor use in people living with HIV-1.1型人类免疫缺陷病毒感染者中辅助受体使用导致的炎症和免疫激活模式

Front Immunol. 2025 Jul 10;16:1632287. doi: 10.3389/fimmu.2025.1632287. eCollection 2025.

Next-Generation Sequencing Methods to Determine the Accuracy of Retroviral Reverse Transcriptases: Advantages and Limitations.用于确定逆转录病毒逆转录酶准确性的新一代测序方法：优势与局限

Viruses. 2025 Jan 26;17(2):173. doi: 10.3390/v17020173.

HIV-1 Rebound Virus Consists of a Small Number of Lineages That Entered the Reservoir Close to ART Initiation.HIV-1 反弹病毒由少数在接近开始抗逆转录病毒治疗时进入病毒库的谱系组成。

bioRxiv. 2025 Jan 31:2025.01.29.635391. doi: 10.1101/2025.01.29.635391.

Neurosymptomatic HIV-1 CSF escape is associated with replication in CNS T cells and inflammation.神经症状性 HIV-1 脑脊液逃逸与中枢神经系统 T 细胞复制和炎症有关。

J Clin Invest. 2024 Oct 1;134(19):e176358. doi: 10.1172/JCI176358.

N4-Hydroxycytidine/molnupiravir inhibits RNA virus-induced encephalitis by producing less fit mutated viruses.N4-羟基胞苷/莫努匹韦通过产生适应性较低的突变病毒来抑制 RNA 病毒诱导的脑炎。

PLoS Pathog. 2024 Sep 30;20(9):e1012574. doi: 10.1371/journal.ppat.1012574. eCollection 2024 Sep.

Impact of Low-Frequency Human Immunodeficiency Virus Type 1 Drug Resistance Mutations on Antiretroviral Therapy Outcomes.低频人类免疫缺陷病毒 1 型耐药突变对抗逆转录病毒治疗结果的影响。

J Infect Dis. 2024 Jul 25;230(1):86-94. doi: 10.1093/infdis/jiae131.

Combined Treatment of Severe Acute Respiratory Syndrome Coronavirus 2 Reduces Molnupiravir-Induced Mutagenicity and Prevents Selection for Nirmatrelvir/Ritonavir Resistance Mutations.严重急性呼吸综合征冠状病毒2的联合治疗可降低莫努匹拉韦诱导的致突变性并防止对奈玛特韦/利托那韦耐药突变的选择。

J Infect Dis. 2024 Dec 16;230(6):1380-1383. doi: 10.1093/infdis/jiae213.

Development and Validation of a Genotypic Assay to Quantify CXCR4- and CCR5-Tropic Human Immunodeficiency Virus Type-1 (HIV-1) Populations and a Comparison to Trofile.用于定量CXCR4嗜性和CCR5嗜性1型人类免疫缺陷病毒（HIV-1）群体的基因分型检测方法的开发与验证以及与Trofile检测法的比较

Viruses. 2024 Mar 27;16(4):510. doi: 10.3390/v16040510.

The timing of HIV-1 infection of cells that persist on therapy is not strongly influenced by replication competency or cellular tropism of the provirus.在接受治疗后仍持续存在的细胞发生HIV-1感染的时间，并不受前病毒的复制能力或细胞嗜性的强烈影响。

PLoS Pathog. 2024 Feb 29;20(2):e1011974. doi: 10.1371/journal.ppat.1011974. eCollection 2024 Feb.

Loss of West Nile virus genetic diversity during mosquito infection due to species-dependent population bottlenecks.由于物种依赖性种群瓶颈，西尼罗河病毒在蚊子感染过程中遗传多样性丧失。

iScience. 2023 Aug 25;26(10):107711. doi: 10.1016/j.isci.2023.107711. eCollection 2023 Oct 20.

本文引用的文献

AIDS Res Hum Retroviruses. 2015 Jun;31(6):658-68. doi: 10.1089/AID.2014.0031. Epub 2015 Apr 2.

Challenges with using primer IDs to improve accuracy of next generation sequencing.使用引物ID提高下一代测序准确性所面临的挑战。

PLoS One. 2015 Mar 5;10(3):e0119123. doi: 10.1371/journal.pone.0119123. eCollection 2015.

subSeq: determining appropriate sequencing depth through efficient read subsampling.子序列：通过高效的读段二次抽样确定合适的测序深度。

Bioinformatics. 2014 Dec 1;30(23):3424-6. doi: 10.1093/bioinformatics/btu552. Epub 2014 Sep 3.

Performance comparison of second- and third-generation sequencers using a bacterial genome with two chromosomes.使用具有两条染色体的细菌基因组对第二代和第三代测序仪进行性能比较。

BMC Genomics. 2014 Aug 21;15(1):699. doi: 10.1186/1471-2164-15-699.

Clinical evidence and bioinformatics characterization of potential hepatitis C virus resistance pathways for sofosbuvir.索磷布韦潜在丙型肝炎病毒耐药途径的临床证据和生物信息学特征。

Hepatology. 2015 Jan;61(1):56-65. doi: 10.1002/hep.27375. Epub 2014 Nov 20.

Hepatitis C Virus (HCV) NS3 sequence diversity and antiviral resistance-associated variant frequency in HCV/HIV coinfection.丙型肝炎病毒（HCV）/艾滋病病毒（HIV）合并感染中丙型肝炎病毒NS3序列多样性及抗病毒耐药相关变异频率

Antimicrob Agents Chemother. 2014 Oct;58(10):6079-92. doi: 10.1128/AAC.03466-14. Epub 2014 Aug 4.

HIV transmission. Selection bias at the heterosexual HIV-1 transmission bottleneck.HIV 传播。异性恋 HIV-1 传播瓶颈处的选择偏倚。

Science. 2014 Jul 11;345(6193):1254031. doi: 10.1126/science.1254031. Epub 2014 Jul 10.

Theoretical and experimental assessment of degenerate primer tagging in ultra-deep applications of next-generation sequencing.下一代测序超深度应用中简并引物标记的理论与实验评估

Nucleic Acids Res. 2014 Jul;42(12):e98. doi: 10.1093/nar/gku355. Epub 2014 May 7.

High-throughput DNA sequencing errors are reduced by orders of magnitude using circle sequencing.高通量 DNA 测序错误可通过环测序降低数量级。

Proc Natl Acad Sci U S A. 2013 Dec 3;110(49):19872-7. doi: 10.1073/pnas.1319590110. Epub 2013 Nov 15.

Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data.RNA测序数据差异基因表达分析方法的综合评估

Genome Biol. 2013;14(9):R95. doi: 10.1186/gb-2013-14-9-r95.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验