利用独特的分子标识符减少短串联重复序列位点的噪声和停顿。

Reducing noise and stutter in short tandem repeat loci with unique molecular identifiers.

机构信息

Center for Human Identification, University of North Texas Health Science Center, 3500 Camp Bowie Blvd., Fort Worth, TX 76107, USA; Department of Microbiology, Immunology and Genetics, University of North Texas Health Science Center, 3500 Camp Bowie Blvd., Fort Worth, TX 76107, USA.

Center for Human Identification, University of North Texas Health Science Center, 3500 Camp Bowie Blvd., Fort Worth, TX 76107, USA.

出版信息

Forensic Sci Int Genet. 2021 Mar;51:102459. doi: 10.1016/j.fsigen.2020.102459. Epub 2020 Dec 25.

DOI:10.1016/j.fsigen.2020.102459

PMID:33429137

Abstract

Unique molecular identifiers (UMIs) are a promising approach to contend with errors generated during PCR and massively parallel sequencing (MPS). With UMI technology, random molecular barcodes are ligated to template DNA molecules prior to PCR, allowing PCR and sequencing error to be tracked and corrected bioinformatically. UMIs have the potential to be particularly informative for the interpretation of short tandem repeats (STRs). Traditional MPS approaches may simply lead to the observation of alleles that are consistent with the hypotheses of stutter, while with UMIs stutter products bioinformatically may be re-associated with their parental alleles and subsequently removed. Herein, a bioinformatics pipeline named strumi is described that is designed for the analysis of STRs that are tagged with UMIs. Unlike other tools, strumi is an alignment-free machine learning driven algorithm that clusters individual MPS reads into UMI families, infers consensus super-reads that represent each family and provides an estimate the resulting haplotype's accuracy. Super-reads, in turn, approximate independent measurements not of the PCR products, but of the original template molecules, both in terms of quantity and sequence identity. Provisional assessments show that naïve threshold-based approaches generate super-reads that are accurate (∼97 % haplotype accuracy, compared to ∼78 % when UMIs are not used), and the application of a more nuanced machine learning approach increases the accuracy to ∼99.5 % depending on the level of certainty desired. With these features, UMIs may greatly simplify probabilistic genotyping systems and reduce uncertainty. However, the ability to interpret alleles at trace levels also permits the interpretation, characterization and quantification of contamination as well as somatic variation (including somatic stutter), which may present newfound challenges.

摘要

独特分子标识符 (UMI) 是一种有前途的方法，可以解决 PCR 和大规模并行测序 (MPS) 过程中产生的错误。使用 UMI 技术，在 PCR 之前将随机分子条形码连接到模板 DNA 分子上，允许通过生物信息学跟踪和纠正 PCR 和测序错误。UMI 有可能为短串联重复序列 (STR) 的解释提供特别有价值的信息。传统的 MPS 方法可能只是导致观察到与突发假说一致的等位基因，而使用 UMI 则可以通过生物信息学将突发产物重新关联到其亲本等位基因上，然后将其去除。本文描述了一种名为 strumi 的生物信息学分析流程，该流程专为标记有 UMI 的 STR 分析而设计。与其他工具不同，strumi 是一种无比对的机器学习驱动算法，它将单个 MPS 读取聚类到 UMI 家族中，推断代表每个家族的共识超读取，并提供对所得单倍型准确性的估计。反过来，超读取近似于原始模板分子的独立测量，而不仅仅是 PCR 产物的独立测量，无论是在数量还是序列一致性方面。初步评估表明，基于阈值的简单方法生成的超读取是准确的（单倍型准确性约为 97%，而不使用 UMI 时约为 78%），并且应用更细致的机器学习方法可以根据所需的确定性水平将准确性提高到约 99.5%。有了这些特性，UMI 可以极大地简化概率基因分型系统并降低不确定性。然而，在痕量水平上解释等位基因的能力也允许对污染以及体细胞变异（包括体细胞突发）进行解释、特征描述和定量，这可能会带来新的挑战。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

利用独特的分子标识符减少短串联重复序列位点的噪声和停顿。

Reducing noise and stutter in short tandem repeat loci with unique molecular identifiers.

机构信息

出版信息

相似文献

引用本文的文献

利用独特的分子标识符减少短串联重复序列位点的噪声和停顿。

Reducing noise and stutter in short tandem repeat loci with unique molecular identifiers.

机构信息

出版信息

相似文献

引用本文的文献