Armed Forces Medical Examiner System, Armed Forces DNA Identification Laboratory, Dover, Delaware, USA; SNA International, Alexandria, Virginia, USA; Department of Immunology, Genetics and Pathology, Uppsala University, 751 08, Uppsala, Sweden.
Institute of Legal Medicine, Medical University of Innsbruck, Innsbruck, Austria; Forensic Science Program, The Pennsylvania State University, University Park, Pennsylvania, USA.
Forensic Sci Int Genet. 2020 Jan;44:102205. doi: 10.1016/j.fsigen.2019.102205. Epub 2019 Nov 10.
Advancements in sequencing technologies allow for rapid and efficient analysis of mitochondrial DNA (mtDNA) in forensic laboratories, which is particularly beneficial for specimens with limited nuclear DNA. Next generation sequencing (NGS) offers higher throughput and sensitivity over traditional Sanger-type sequencing (STS) as well as the ability to quantitatively analyze the data. Changes in sample preparation, sequencing method and analysis required for NGS may alter the mtDNA haplotypes compared to previously generated STS data. Thus, the present study aimed to characterize the impact of different sequencing workflows on the detection and interpretation of length heteroplasmy (LHP), a particularly complicated aspect of mtDNA analysis. Whole mtDNA genome (mitogenome) data were generated for 16 high-quality samples using well-established Illumina and Ion methods, and the NGS data were compared to previously-generated STS mtDNA control region data. Although the mitogenome haplotypes were concordant with the exception of length and low-level variants (<30 % variant frequency), LHP in the hypervariable segment (HVS) polycytosine regions (C-tracts) differed across sequencing methods. Consistent with previous studies, LHP in HVS1 was observed in samples with nine or more consecutive cytosines (Cs) and eight Cs in the HVS2 region in the STS data. The Illumina data produced a similar pattern of LHP as the STS data, whereas the Ion data were noticeably different. More complex LHP (i.e. more length molecules) was observed in the Ion data, as length variation occurred in multiple homopolymer stretches within the targeted HVS regions. Further, the STS dominant or major molecule (MM) differed from the Ion MM in 11 (37 %) of the 30 regions evaluated and six instances (20 %) in Illumina data. This is of particular interest, as the MM is used by many forensic laboratories to report the HVS C-tract in the mtDNA haplotype. In general, the STS MMs were longer than the Illumina MMs, while the Ion MMs were the shortest. The higher rate of homopolymer indels in Ion data likely contributed to these differences. Supplemental analysis with alternative approaches demonstrated that the LHP pattern may also be altered by the bioinformatic tool and workflow used for data interpretation. The broader application of NGS in forensic laboratories will undoubtedly result in the use of varying sample preparation and sequencing methods. Based on these findings, minor LHP differences are expected across sequencing workflows, and it will be important that C-tract indels continue to be ignored for forensic queries and comparisons.
测序技术的进步使得在法医实验室中快速有效地分析线粒体 DNA(mtDNA)成为可能,这对于核 DNA 有限的样本尤其有益。与传统的 Sanger 型测序(STS)相比,下一代测序(NGS)具有更高的通量和灵敏度,并且能够定量分析数据。NGS 所需的样品制备、测序方法和分析的变化可能会改变与之前生成的 STS 数据相比的 mtDNA 单倍型。因此,本研究旨在描述不同测序工作流程对检测和解释长度异质性(LHP)的影响,这是 mtDNA 分析中一个特别复杂的方面。使用成熟的 Illumina 和 Ion 方法为 16 个高质量样本生成整个线粒体基因组(mitogenome)数据,并将 NGS 数据与之前生成的 STS mtDNA 控制区数据进行比较。尽管除了长度和低水平变异(<30%变异频率)之外,除了长度和低水平变异(<30%变异频率)之外,mitogenome 单倍型是一致的,但在高度可变区(HVS)多嘧啶核苷酸区(C 区)的 LHP 在测序方法上存在差异。与之前的研究一致,在 STS 数据中,HVS1 中的 LHP 观察到在有九个或更多连续胞嘧啶(Cs)的样本中,以及在 HVS2 区域中有八个 Cs。Illumina 数据产生了与 STS 数据相似的 LHP 模式,而 Ion 数据则明显不同。在 Ion 数据中观察到更复杂的 LHP(即更多的长度分子),因为在靶向 HVS 区域内的多个同聚物延伸中发生了长度变化。此外,在评估的 30 个区域中,STS 主导或主要分子(MM)与 Ion MM 不同的有 11 个(37%),在 Illumina 数据中有 6 个(20%)。这一点特别有趣,因为许多法医实验室都使用 MM 来报告 mtDNA 单倍型中的 HVS C 区。一般来说,STS MM 比 Illumina MM 长,而 Ion MM 最短。Ion 数据中更高的同源多聚体插入缺失率可能导致了这些差异。使用替代方法进行的补充分析表明,LHP 模式也可能因用于数据解释的生物信息学工具和工作流程而改变。NGS 在法医实验室中的更广泛应用无疑将导致不同的样品制备和测序方法的使用。基于这些发现,预计在测序工作流程中会出现较小的 LHP 差异,重要的是,C 区插入缺失应继续被忽略用于法医查询和比较。