Ge Jianye, Budowle Bruce, Planz John V, Eisenberg Arthur J, Ballantyne Jack, Chakraborty Ranajit
Department of Forensic and Investigative Genetics, University of North Texas Health Science Center, Ft Worth, TX 76107, USA.
Leg Med (Tokyo). 2010 Nov;12(6):289-95. doi: 10.1016/j.legalmed.2010.07.006.
A forensic Y-STR database generated in the US was compiled with profiles containing a portion or complete typing of 16 STR markers DYS19, DYS385, DYS389I, DYS389II, DYS390, DYS391, DYS392, DYS393, DYS437, DYS438, DYS439, DYS456, DYS458, DYS635, DYS448, and Y GATA H4. There were 17,447 samples in the version of database in which 77% and 20% were collected in North America and Asia, respectively. The database was separated into six general populations, African American, Asian, Caucasian, Hispanic, Indian, and Native American. Each population was further classified into subgroups according to geographic regions. Some subgroups were tested, found to be homogenous and merged together. Allele and haplotype frequencies, as well as sample sizes were summarized. Of the full haplotypes (i.e., 16 STRs without missing data), 93.7% in total population were distinct, 92.9% were population specific, and 89.3% were only observed once. The majority of shared haplotypes were found among North American populations as a result of admixture lasting the past few hundred years. The power of discrimination (PD), coancestry coefficient (F(st)), and coefficient of gene differentiation (G(st)) at locus and haplotype levels were also calculated. The most polymorphic marker was DYS385; this marker contains a tandem duplication and actually is composed of two loci. Both G(st) and F(st) estimates were very small with haplotypes composed of a high number of STRs haplotypes (e.g., 10-16 markers), although G(st) is slightly more conservative for these extended haplotypes. With Native American removed from the total population data set, the G(st) and F(st) estimates reduce further. PD was 0.9998 for the total population dataset for all 16 Y-STR markers. Three measures of Y-STR profile frequency were calculated: (1) unconditional haplotype frequency, (2) population substructure adjusted frequency, and (3) binomial upper bound of the haplotype frequency. The binomial upper bound is the most conservative estimate for most forensic applications. Estimates of the weight of a Y-STR haplotype can be estimated using population specific or total population databases.
在美国生成的一个法医Y-STR数据库,其档案包含16个STR标记DYS19、DYS385、DYS389I、DYS389II、DYS390、DYS391、DYS392、DYS393、DYS437、DYS438、DYS439、DYS456、DYS458、DYS635、DYS448以及Y GATA H4的部分或完整分型。该数据库版本中有17447个样本,其中77%和20%分别采集于北美和亚洲。该数据库被分为六个一般人群,非裔美国人、亚洲人、高加索人、西班牙裔、印第安人和美洲原住民。每个群体根据地理区域进一步细分为亚组。一些亚组经过测试,发现是同质的并合并在一起。总结了等位基因和单倍型频率以及样本量。在全部单倍型(即16个STR无缺失数据)中,总人群中93.7%是不同的,92.9%是特定人群的,89.3%仅被观察到一次。由于过去几百年的混合,大多数共享单倍型出现在北美人群中。还计算了位点和单倍型水平的鉴别力(PD)、共祖系数(F(st))和基因分化系数(G(st))。多态性最高的标记是DYS385;该标记包含一个串联重复,实际上由两个位点组成。对于由大量STR单倍型(如10 - 16个标记)组成的单倍型,G(st)和F(st)估计值都非常小,尽管对于这些扩展单倍型G(st)稍微更保守一些。从总人群数据集中去除美洲原住民后,G(st)和F(st)估计值进一步降低。对于所有16个Y-STR标记的总人群数据集,PD为0.9998。计算了Y-STR图谱频率的三种度量:(1)无条件单倍型频率,(2)群体亚结构调整频率,以及(3)单倍型频率的二项式上限。对于大多数法医应用,二项式上限是最保守的估计。可以使用特定人群或总人群数据库来估计Y-STR单倍型的权重。