National Institute of Standards and Technology, 100 Bureau Drive, M/S 8314, Gaithersburg, MD 20899, USA.
National Institute of Standards and Technology, 100 Bureau Drive, M/S 8314, Gaithersburg, MD 20899, USA.
Forensic Sci Int Genet. 2022 Mar;57:102655. doi: 10.1016/j.fsigen.2021.102655. Epub 2021 Dec 28.
This manuscript reports Y-chromosomal short tandem repeat (Y-STR) haplotypes for 1032 male U.S. population samples across 30 Y-STR loci characterized by three capillary electrophoresis (CE) length-based kits (PowerPlex Y23 System, Yfiler Plus PCR Amplification Kit, and Investigator Argus Y-28 QS Kit) and one sequence-based kit (ForenSeq DNA Signature Prep Kit): DYF387S1, DYS19, DYS385 a/b, DYS389I, DYS389II, DYS390, DYS391, DYS392, DYS393, DYS437, DYS438, DYS439, DYS448, DYS449, DYS456, DYS458, DYS460, DYS481, DYS505, DYS518, DYS522, DYS533, DYS549, DYS570, DYS576, DYS612, DYS627, DYS635, DYS643, and Y-GATA-H4. The length-based Y-STR haplotypes include six loci that are not reported in the sequence-based kit (DYS393, DYS449, DYS456, DYS458, DYS518, and DYS627), whereas three loci included in the sequence-based kit are not present in length-based kits (DYS505, DYS522, and DYS612). For the latter, a custom multiplex was used to generate CE length-based data, allowing 1032 samples to be evaluated for concordance across the 30 Y-STR loci included in these four commercial Y-STR typing kits. Discordances between typing methods were analyzed further to assess underlying causes such as primer binding site mutations and flanking region insertions/deletions. Allele-level frequency and statistical information is provided for sequenced loci, excluding the multi-copy loci DYF387S1 and DYS385 a/b, for which locus-specific haplotype-level frequencies are provided instead. The resulting data reveals the degree of information gained through sequencing: 88% of sequenced Y-STR loci contain additional sequence-based alleles compared to length-based data, with the DYS389II locus containing the most additional alleles (51) observed by sequencing. Despite these allelic increases, only minimal improvement was observed in haplotype resolution by sequence, with all four commercial kits providing a similar ability to differentiate length-based haplotypes in this sample set. Finally, a subset of 369 male samples were compared to their corresponding additionally sequenced father samples, revealing the sequence basis for the 50 length-based changes observed, and no additional sequence-based mutations. GenBank accession numbers are reported for each unique sequence, and associated records are available in the STRSeq Y-Chromosomal STR Loci National Center for Biotechnology Information (NCBI) BioProject, accession PRJNA380347. Haplotype data is updated in the Y-STR Haplotype Reference Database (YHRD) for the 'NIST 1032' data set to now achieve the level of maximal haplotype of YHRD. All supplementary files including revisions to previously published Y-STR data are available in the NIST Public Data Repository: U.S. population data for human identification markers, DOI 10.18434/t4/1500024.
本手稿报告了美国 30 个 Y-STR 基因座中 1032 名男性个体的 Y 染色体短串联重复(Y-STR)单体型,这些基因座由三种基于毛细管电泳(CE)长度的试剂盒(PowerPlex Y23 系统、Yfiler Plus PCR 扩增试剂盒和 Investigator Argus Y-28 QS 试剂盒)和一种基于序列的试剂盒(ForenSeq DNA Signature Prep 试剂盒)所特征:DYF387S1、DYS19、DYS385a/b、DYS389I、DYS389II、DYS390、DYS391、DYS392、DYS393、DYS394、DYS395、DYS396、DYS397、DYS398、DYS399、DYS437、DYS438、DYS439、DYS448、DYS449、DYS456、DYS458、DYS460、DYS481、DYS505、DYS518、DYS522、DYS533、DYS549、DYS570、DYS576、DYS612、DYS627、DYS635、DYS643 和 Y-GATA-H4。基于长度的 Y-STR 单体型包括六个未在基于序列的试剂盒中报告的基因座(DYS393、DYS449、DYS456、DYS458、DYS518 和 DYS627),而包含在基于序列试剂盒中的三个基因座不存在于基于长度的试剂盒中(DYS505、DYS522 和 DYS612)。对于后者,使用定制的多重 PCR 来生成基于 CE 长度的数据,允许对这四个商业 Y-STR 分型试剂盒中包含的 30 个 Y-STR 基因座中的 1032 个样本进行一致性评估。进一步分析分型方法之间的不匹配,以评估潜在的原因,如引物结合位点突变和侧翼区域插入/缺失。排除多拷贝基因座 DYF387S1 和 DYS385a/b,提供了测序基因座的等位基因频率和统计信息,对于这些基因座,提供了特定基因座单体型水平频率的信息。结果数据揭示了通过测序获得的信息量:与基于长度的数据相比,88%的测序 Y-STR 基因座包含额外的基于序列的等位基因,DYS389II 基因座观察到最多的额外等位基因(51 个)。尽管这些等位基因增加了,但是通过序列获得的单体型分辨率仅略有提高,这四个商业试剂盒在这个样本集中提供了相似的区分长度基于单体型的能力。最后,对 369 名男性样本的子集与他们相应的额外测序父亲样本进行了比较,揭示了观察到的 50 个长度基于变化的序列基础,没有额外的基于序列的突变。每个独特序列的 GenBank accession numbers 都有报道,相关记录可在 STRSeq Y 染色体 STR 基因座国家生物技术信息中心(NCBI)BioProject 中获得, accession PRJNA380347。单体型数据在 Y-STR 单体型参考数据库(YHRD)中进行了更新,以适应'NIST 1032'数据集,现在达到了 YHRD 的最大单体型水平。所有的补充文件,包括对以前发表的 Y-STR 数据的修订,都可在 NIST 公共数据存储库中获得:人类识别标记的美国人口数据,DOI 10.18434/t4/1500024。