Department of Chemistry, Rutgers University, Camden, NJ 08102, USA; Center for Computational and Integrative Biology, Rutgers University, Camden, NJ 08102, USA; Program in Biomedical Forensic Sciences, Boston University, Boston, MA 02118, USA.
Center for Computational and Integrative Biology, Rutgers University, Camden, NJ 08102, USA.
Forensic Sci Int Genet. 2024 Mar;69:103000. doi: 10.1016/j.fsigen.2023.103000. Epub 2023 Dec 19.
In the absence of a suspect the forensic aim is investigative, and the focus is one of discerning what genotypes best explain the evidence. In traditional systems, the list of candidate genotypes may become vast if the sample contains DNA from many donors or the information from a minor contributor is swamped by that of major contributors, leading to lower evidential value for a true donor's contribution and, as a result, possibly overlooked or inefficient investigative leads. Recent developments in single-cell analysis offer a way forward, by producing data capable of discriminating genotypes. This is accomplished by first clustering single-cell data by similarity without reference to a known genotype. With good clustering it is reasonable to assume that the scEPGs in a cluster are of a single contributor. With that assumption we determine the probability of a cluster's content given each possible genotype at each locus, which is then used to determine the posterior probability mass distribution for all genotypes by application of Bayes' rule. A decision criterion is then applied such that the sum of the ranked probabilities of all genotypes falling in the set is at least 1-α. This is the credible genotype set and is used to inform database search criteria. Within this work we demonstrate the salience of single-cell analysis by performance testing a set of 630 previously constructed admixtures containing up to 5 donors of balanced and unbalanced contributions. We use scEPGs that were generated by isolating single cells, employing a direct-to-PCR extraction treatment, amplifying STRs that are compliant with existing national databases and applying post-PCR treatments that elicit a detection limit of one DNA copy. We determined that, for these test data, 99.3% of the true genotypes are included in the 99.8% credible set, regardless of the number of donors that comprised the mixture. We also determined that the most probable genotype was the true genotype for 97% of the loci when the number of cells in a cluster was at least two. Since efficient investigative leads will be borne by posterior mass distributions that are narrow and concentrated at the true genotype, we report that, for this test set, 47,900 (86%) loci returned only one credible genotype and of these 47,551 (99%) were the true genotype. When determining the LR for true contributors, 91% of the clusters rendered LR>10, showing the potential of single-cell data to positively affect investigative reporting.
在没有嫌疑人的情况下,法医学的目标是调查性的,重点是辨别哪些基因型最能解释证据。在传统系统中,如果样本中包含来自多个供体的 DNA 或来自次要供体的信息被主要供体的信息淹没,那么候选基因型的列表可能会变得非常庞大,从而导致真实供体的贡献证据价值降低,结果可能会被忽视或导致调查线索效率低下。单细胞分析的最新发展提供了一种前进的方法,通过产生能够区分基因型的数据。这是通过首先在没有参考已知基因型的情况下通过相似性对单细胞数据进行聚类来实现的。通过良好的聚类,可以合理地假设聚类中的 scEPG 来自单一供体。基于此假设,我们确定了在每个基因座的每个可能基因型下,聚类内容的概率,然后通过应用贝叶斯法则确定所有基因型的后验概率质量分布。然后应用决策标准,使得落入集合中的所有基因型的排名概率之和至少为 1-α。这是可信的基因型集,并用于通知数据库搜索标准。在这项工作中,我们通过对一组 630 个先前构建的混合物进行性能测试来证明单细胞分析的重要性,这些混合物包含了平衡和不平衡贡献的多达 5 个供体。我们使用通过分离单细胞生成的 scEPG,采用直接到 PCR 提取处理,扩增符合现有国家数据库的 STR,并应用引发检测限为一个 DNA 拷贝的 PCR 后处理。我们确定,对于这些测试数据,无论混合物中包含多少供体,99.3%的真实基因型都包含在 99.8%的可信集中。我们还确定,当聚类中的细胞数至少为两个时,97%的基因座的最可能基因型是真实基因型。由于有效的调查线索将由狭窄且集中在真实基因型上的后验质量分布产生,因此我们报告说,对于这个测试集,47900(86%)个基因座只返回一个可信的基因型,其中 47551(99%)个是真实基因型。在确定真实供体的似然比时,91%的聚类产生的 LR>10,显示单细胞数据有可能对调查报告产生积极影响。