Institute of Legal Medicine, Innsbruck Medical University, Innsbruck, Austria; Forensic Science Program, The Pennsylvania State University, University Park, PA, USA.
Faculty of Life Sciences, King's College, London, UK.
Forensic Sci Int Genet. 2016 May;22:54-63. doi: 10.1016/j.fsigen.2016.01.009. Epub 2016 Jan 21.
The DNA Commission of the International Society for Forensic Genetics (ISFG) is reviewing factors that need to be considered ahead of the adoption by the forensic community of short tandem repeat (STR) genotyping by massively parallel sequencing (MPS) technologies. MPS produces sequence data that provide a precise description of the repeat allele structure of a STR marker and variants that may reside in the flanking areas of the repeat region. When a STR contains a complex arrangement of repeat motifs, the level of genetic polymorphism revealed by the sequence data can increase substantially. As repeat structures can be complex and include substitutions, insertions, deletions, variable tandem repeat arrangements of multiple nucleotide motifs, and flanking region SNPs, established capillary electrophoresis (CE) allele descriptions must be supplemented by a new system of STR allele nomenclature, which retains backward compatibility with the CE data that currently populate national DNA databases and that will continue to be produced for the coming years. Thus, there is a pressing need to produce a standardized framework for describing complex sequences that enable comparison with currently used repeat allele nomenclature derived from conventional CE systems. It is important to discern three levels of information in hierarchical order (i) the sequence, (ii) the alignment, and (iii) the nomenclature of STR sequence data. We propose a sequence (text) string format the minimal requirement of data storage that laboratories should follow when adopting MPS of STRs. We further discuss the variant annotation and sequence comparison framework necessary to maintain compatibility among established and future data. This system must be easy to use and interpret by the DNA specialist, based on a universally accessible genome assembly, and in place before the uptake of MPS by the general forensic community starts to generate sequence data on a large scale. While the established nomenclature for CE-based STR analysis will remain unchanged in the future, the nomenclature of sequence-based STR genotypes will need to follow updated rules and be generated by expert systems that translate MPS sequences to match CE conventions in order to guarantee compatibility between the different generations of STR data.
国际法庭遗传学协会(ISFG)的 DNA 委员会正在审查在法庭科学界采用大规模平行测序(MPS)技术进行短串联重复序列(STR)基因分型之前需要考虑的因素。MPS 产生的序列数据可精确描述 STR 标记的重复等位基因结构以及可能位于重复区域侧翼的变异。当 STR 包含复杂的重复基序排列时,序列数据所揭示的遗传多态性水平可大幅增加。由于重复结构可能很复杂,包括替换、插入、缺失、多个核苷酸基序的可变串联重复排列以及侧翼区域单核苷酸多态性,因此必须对现有的毛细管电泳(CE)等位基因描述进行补充,采用新的 STR 等位基因命名系统,保留与当前填充国家 DNA 数据库的 CE 数据的向后兼容性,并且在未来几年内将继续生成该数据。因此,迫切需要制定一个标准化框架来描述复杂序列,使其能够与目前从传统 CE 系统衍生的重复等位基因命名法进行比较。按照层次顺序区分三个级别的信息(i)序列、(ii)比对和(iii)STR 序列数据的命名法非常重要。我们提出了一个序列(文本)字符串格式,这是实验室在采用 STR 的 MPS 时应遵循的最低数据存储要求。我们进一步讨论了变体注释和序列比较框架,这是在广泛可访问的基因组组装的基础上,维持现有和未来数据之间兼容性所必需的。该系统必须易于 DNA 专家使用和解释,并且在一般法庭科学界开始大规模生成序列数据之前,就已经投入使用。虽然基于 CE 的 STR 分析的既定命名法在未来将保持不变,但基于序列的 STR 基因型的命名法将需要遵循更新的规则,并由专家系统生成,将 MPS 序列转换为符合 CE 惯例的序列,以保证不同代 STR 数据之间的兼容性。