School of Public Health, Forensic Science Group, U.C. Berkeley, Berkeley, CA United States; DNA·VIEW, 6801 Thornhill Drive, Oakland, CA 94611-1336, USA.
Forensic Sci Int Genet. 2010 Oct;4(5):281-91. doi: 10.1016/j.fsigen.2009.10.013. Epub 2010 Jan 12.
Y-chromosomal and mitochondrial haplotyping offer special advantages for criminal (and other) identification. For different reasons, each of them is sometimes detectable in a crime stain for which autosomal typing fails. But they also present special problems, including a fundamental mathematical one: When a rare haplotype is shared between suspect and crime scene, how strong is the evidence linking the two? Assume a reference population sample is available which contains n-1 haplotypes. The most interesting situation as well as the most common one is that the crime scene haplotype was never observed in the population sample. The traditional tools of product rule and sample frequency are not useful when there are no components to multiply and the sample frequency is zero. A useful statistic is the fraction κ of the population sample that consists of "singletons" - of once-observed types. A simple argument shows that the probability for a random innocent suspect to match a previously unobserved crime scene type is (1-κ)/n - distinctly less than 1/n, likely ten times less. The robust validity of this model is confirmed by testing it against a range of population models. This paper hinges above all on one key insight: probability is not frequency. The common but erroneous "frequency" approach adopts population frequency as a surrogate for matching probability and attempts the intractable problem of guessing how many instances exist of the specific haplotype at a certain crime. Probability, by contrast, depends by definition only on the available data. Hence if different haplotypes but with the same data occur in two different crimes, although the frequencies are different (and are hopelessly elusive), the matching probabilities are the same, and are not hard to find.
Y 染色体和线粒体单倍型分析在犯罪(和其他)鉴定方面具有特殊优势。出于不同的原因,它们各自有时在常染色体分型无法检测到的犯罪痕迹中可以被检测到。但它们也存在特殊问题,包括一个基本的数学问题:当可疑人和犯罪现场存在共同的罕见单倍型时,将两者联系起来的证据有多强?假设可以获得一个参考人群样本,其中包含 n-1 种单倍型。最有趣和最常见的情况是犯罪现场单倍型从未在人群样本中观察到过。当没有要相乘的成分且样本频率为零时,乘积规则和样本频率等传统工具就没有用了。一个有用的统计量是由“单一型”(即仅观察到一次的类型)组成的人群样本的分数 κ。一个简单的论证表明,随机无辜嫌疑人与之前未观察到的犯罪现场类型匹配的概率是(1-κ)/n-明显小于 1/n,很可能小十倍。通过对一系列人群模型进行测试,证实了该模型的稳健有效性。本文主要基于一个关键的认识:概率不是频率。常见但错误的“频率”方法将人群频率作为匹配概率的替代物,并尝试猜测特定单倍型在特定犯罪现场存在多少实例这一棘手问题。相比之下,概率仅根据可用数据定义。因此,如果两个不同的犯罪中出现了不同的单倍型,但具有相同的数据,尽管频率不同(而且难以捉摸),但匹配概率是相同的,并且不难找到。