Forensic Genetics Research Group, Department of Forensic Sciences, Oslo University Hospital, Norway.
DNA Support Unit, Federal Bureau of Investigation Laboratory, USA; National Biodefense Analysis and Countermeasures Center, USA.
Forensic Sci Int Genet. 2020 Sep;48:102319. doi: 10.1016/j.fsigen.2020.102319. Epub 2020 Jun 4.
The increased interest in the use of Massively Parallel Sequencing (MPS) technologies to type traditional autosomal STR markers raises multiple questions regarding interpretation of the results via probabilistic genotyping. To begin to address some of those questions, we examined the effects of using differing degrees of sequence information, pre-filtering, and data modeling to interpret complex MPS-STR mixtures in a probabilistic genotyping software. Sixty ForenSeq typing results for mixtures of from two to four contributors were: 1) represented using three separate formats that captured different degrees of sequence information, and 2) were analyzed using three different filtering approaches prior to probabilistic interpretation. All mixtures for the different format and filtering variants were subsequently interpreted with respect to ten reference profiles, using both qualitative (LRmix) and quantitative (EuroForMix) models to calculate the likelihood ratio (LR). The LR results indicated moderate information gain when the STR nomenclature was based upon the longest uninterrupted stretch (LUS) compared with conventional capillary electrophoresis repeat units (RU), whereas additional gains were very small when the complete sequence information was utilised. Use of a static analytical threshold for data pre-filtering improved LRs compared to a dynamic (percentage-based) threshold, as the static threshold prevented excessive filtering of alleles originating from minor contributors. For interpretations performed using a quantitative model, a small improvement in performance was observed if a stutter model was employed instead of using stutter thresholds to pre-filter the data, whereas - as expected - performance worsened considerably under the qualitative model when stutter was not pre-filtered. Given the empirical and theoretical findings in this study we discuss the value of utilizing sequence-level information and potential paths forward to increase information gain using MPS systems.
人们对大规模平行测序(MPS)技术在传统常染色体 STR 标记物分型中的应用越来越感兴趣,这引发了关于通过概率基因分型解读结果的多个问题。为了开始解决其中的一些问题,我们研究了使用不同程度的序列信息、预过滤和数据建模来解释概率基因分型软件中复杂的 MPS-STR 混合物的效果。我们对来自两个到四个供体的混合物的 60 个 ForenSeq 分型结果进行了分析:1)使用三种不同的格式表示,这些格式捕获了不同程度的序列信息,2)在进行概率解释之前,使用三种不同的过滤方法进行分析。对于不同格式和过滤变体的所有混合物,均使用定性(LRmix)和定量(EuroForMix)模型针对十个参考谱进行了解释,以计算似然比(LR)。结果表明,与传统毛细管电泳重复单位(RU)相比,基于最长不中断延伸(LUS)的 STR 命名法可获得中等程度的信息增益,而当使用完整的序列信息时,额外的增益非常小。与动态(基于百分比)阈值相比,使用静态分析阈值进行数据预过滤可提高 LR,因为静态阈值可防止对来自次要供体的等位基因进行过度过滤。对于使用定量模型进行的解释,如果使用乱序模型而不是使用乱序阈值来预过滤数据,则性能会略有提高,而在定性模型下,如果不预过滤乱序,则性能会大大恶化,这是预期的结果。考虑到本研究中的经验和理论发现,我们讨论了利用序列级信息的价值以及使用 MPS 系统提高信息增益的潜在途径。