Steele Christopher D, Greenhalgh Matthew, Balding David J
Stat Appl Genet Mol Biol. 2016 Oct 1;15(5):431-445. doi: 10.1515/sagmb-2016-0038.
In recent years statistical models for the analysis of complex (low-template and/or mixed) DNA profiles have moved from using only presence/absence information about allelic peaks in an electropherogram, to quantitative use of peak heights. This is challenging because peak heights are very variable and affected by a number of factors. We present a new peak-height model with important novel features, including over- and double-stutter, and a new approach to dropin. Our model is incorporated in open-source R code likeLTD. We apply it to 108 laboratory-generated crime-scene profiles and demonstrate techniques of model validation that are novel in the field. We use the results to explore the benefits of modeling peak heights, finding that it is not always advantageous, and to assess the merits of pre-extraction replication. We also introduce an approximation that can reduce computational complexity when there are multiple low-level contributors who are not of interest to the investigation, and we present a simple approximate adjustment for linkage between loci, making it possible to accommodate linkage when evaluating complex DNA profiles.
近年来,用于分析复杂(低模板和/或混合)DNA图谱的统计模型已从仅使用电泳图中等位基因峰的有无信息,转变为对峰高进行定量使用。这具有挑战性,因为峰高变化很大且受多种因素影响。我们提出了一种具有重要新特性的新峰高模型,包括过拖尾和双拖尾,以及一种新的插入方法。我们的模型包含在开源R代码likeLTD中。我们将其应用于108个实验室生成的犯罪现场图谱,并展示了该领域新颖的模型验证技术。我们利用结果探索对峰高进行建模的益处,发现其并非总是有利的,并评估提取前复制的优点。我们还引入了一种近似方法,当存在多个对调查不感兴趣的低水平贡献者时,可降低计算复杂度,并且我们提出了一种用于位点间连锁的简单近似调整方法,使得在评估复杂DNA图谱时能够考虑连锁情况。