National Forensic Centre, Swedish Police Authority, Linköping SE-581 94, Sweden.
National Institute of Standards and Technology, 100 Bureau Drive, M/S 8314, Gaithersburg, MD 20899, USA.
Forensic Sci Int Genet. 2024 Jul;71:103047. doi: 10.1016/j.fsigen.2024.103047. Epub 2024 Apr 3.
Massively parallel sequencing (MPS) is increasingly applied in forensic short tandem repeat (STR) analysis. The presence of stutter artefacts and other PCR or sequencing errors in the MPS-STR data partly limits the detection of low DNA amounts, e.g., in complex mixtures. Unique molecular identifiers (UMIs) have been applied in several scientific fields to reduce noise in sequencing. UMIs consist of a stretch of random nucleotides, a unique barcode for each starting DNA molecule, that is incorporated in the DNA template using either ligation or PCR. The barcode is used to generate consensus reads, thus removing errors. The SiMSen-Seq (Simple, multiplexed, PCR-based barcoding of DNA for sensitive mutation detection using sequencing) method relies on PCR-based introduction of UMIs and includes a sophisticated hairpin design to reduce unspecific primer binding as well as PCR protocol adjustments to further optimize the reaction. In this study, SiMSen-Seq is applied to develop a proof-of-concept seven STR multiplex for MPS library preparation and an associated bioinformatics pipeline. Additionally, machine learning (ML) models were evaluated to further improve UMI allele calling. Overall, the seven STR multiplex resulted in complete detection and concordant alleles for 47 single-source samples at 1 ng input DNA as well as for low-template samples at 62.5 pg input DNA. For twelve challenging mixtures with minor contributions of 10 pg to 150 pg and ratios of 1-15% relative to the major donor, 99.2% of the expected alleles were detected by applying the UMIs in combination with an ML filter. The main impact of UMIs was a substantially lowered number of artefacts as well as reduced stutter ratios, which were generally below 5% of the parental allele. In conclusion, UMI-based STR sequencing opens new means for improved analysis of challenging crime scene samples including complex mixtures.
大规模并行测序(MPS)越来越多地应用于法医短串联重复序列(STR)分析。MPS-STR 数据中存在的重迭伪像和其他 PCR 或测序错误部分限制了低 DNA 量的检测,例如在复杂混合物中。独特分子标识符(UMI)已在多个科学领域中应用,以减少测序中的噪声。UMI 由一段随机核苷酸组成,每个起始 DNA 分子都有一个独特的条形码,该条形码通过连接或 PCR 掺入 DNA 模板中。条形码用于生成一致的读取,从而消除错误。SiMSen-Seq(使用测序对 DNA 进行简单、多重、基于 PCR 的 UMI 条形码标记,以灵敏检测突变)方法依赖于基于 PCR 的 UMI 引入,并且包括一种复杂的发夹设计,以减少非特异性引物结合,以及 PCR 协议调整,以进一步优化反应。在这项研究中,SiMSen-Seq 被应用于开发用于 MPS 文库制备的概念验证七重 STR 多重扩增,以及相关的生物信息学管道。此外,还评估了机器学习(ML)模型,以进一步提高 UMI 等位基因调用的准确性。总体而言,该七重 STR 多重扩增在 47 个单源样本(输入 DNA 为 1ng)和低模板样本(输入 DNA 为 62.5pg)中实现了完全检测和一致的等位基因。对于 12 个具有挑战性的混合物,其次要贡献为 10pg 至 150pg,相对主要供体的比例为 1-15%,通过应用 ML 滤波器与 UMI 结合,检测到了 99.2%的预期等位基因。UMI 的主要影响是显著降低了伪像数量和重迭比率,总体低于亲本等位基因的 5%。总之,基于 UMI 的 STR 测序为包括复杂混合物在内的具有挑战性的犯罪现场样本的分析提供了新的手段。