Ramani Anantharaman, Wong Yongxun, Tan Si Zhen, Shue Bing Hong, Syn Christopher
Biology Division, Applied Sciences Group, Health Sciences Authority, 11 Outram Road, Singapore 169078, Singapore.
Biology Division, Applied Sciences Group, Health Sciences Authority, 11 Outram Road, Singapore 169078, Singapore.
Forensic Sci Int Genet. 2017 Nov;31:171-179. doi: 10.1016/j.fsigen.2017.08.013. Epub 2017 Aug 15.
The ability to predict bio-geographic ancestry can be valuable to generate investigative leads towards solving crimes. Ancestry informative marker (AIM) sets include large numbers of SNPs to predict an ancestral population. Massively parallel sequencing has enabled forensic laboratories to genotype a large number of such markers in a single assay. Illumina's ForenSeq DNA Signature Kit includes the ancestry informative SNPs reported by Kidd et al. In this study, the ancestry prediction capabilities of the ForenSeq kit through sequencing on the MiSeq FGx were evaluated in 1030 unrelated Singapore population samples of Chinese, Malay and Indian origin. A total of 59 ancestry SNPs and phenotypic SNPs with AIM properties were selected. The bio-geographic ancestry of the 1030 samples, as predicted by Illumina's ForenSeq Universal Analysis Software (UAS), was determined. 712 of the genotyped samples were used as a training sample set for the generation of an ancestry prediction model using STRUCTURE and Snipper. The performance of the prediction model was tested by both methods with the remaining 318 samples. Ancestry prediction in UAS was able to correctly classify the Singapore Chinese as part of the East Asian cluster, while Indians clustered with Ad-mixed Americans and Malays clustered in-between these two reference populations. Principal component analyses showed that the 59 SNPs were only able to account for 26% of the variation between the Singapore sub-populations. Their discriminatory potential was also found to be lower (G=0.085) than that reported in ALFRED (F=0.357). The Snipper algorithm was able to correctly predict bio-geographic ancestry in 91% of Chinese and Indian, and 88% of Malay individuals, while the success rates for the STRUCTURE algorithm were 94% in Chinese, 80% in Malay, and 91% in Indian individuals. Both these algorithms were able to provide admixture proportions when present. Ancestry prediction accuracy (in terms of likelihood ratio) was generally high in the absence of admixture. Misclassification occurred in admixed individuals, who were likely offspring of inter-ethnic marriages, and hence whose self-reported bio-geographic ancestries were dependent on that of their fathers, and in individuals of minority sub-populations with inter-ethnic beliefs. The ancestry prediction capabilities of the 59 SNPs on the ForenSeq kit were reasonably effective in differentiating the Singapore Chinese, Malay and Indian sub-populations, and will be of use for investigative purposes. However, there is potential for more accurate prediction through the evaluation of other AIM sets.
预测生物地理血统的能力对于产生解决犯罪问题的调查线索可能很有价值。血统信息标记(AIM)集包含大量用于预测祖先群体的单核苷酸多态性(SNP)。大规模平行测序使法医实验室能够在一次检测中对大量此类标记进行基因分型。Illumina公司的ForenSeq DNA Signature试剂盒包含了Kidd等人报告的具有血统信息的SNP。在本研究中,通过在MiSeq FGx上进行测序,对1030名来自新加坡、具有中国、马来和印度血统的无亲缘关系人群样本评估了ForenSeq试剂盒的血统预测能力。共选择了59个具有AIM特性的血统SNP和表型SNP。通过Illumina公司的ForenSeq通用分析软件(UAS)确定了1030个样本的生物地理血统。712个基因分型样本被用作训练样本集,使用STRUCTURE和Snipper生成血统预测模型。用其余318个样本通过这两种方法测试预测模型的性能。UAS中的血统预测能够正确地将新加坡华人归类为东亚群体的一部分,而印度人则与混血美国人聚类,马来人聚类在这两个参考群体之间。主成分分析表明,这59个SNP仅能解释新加坡亚群体之间26%的变异。还发现它们的鉴别潜力(G = 0.085)低于ALFRED报告的(F = 0.357)。Snipper算法能够正确预测91%的华人、印度人以及88%的马来人的生物地理血统,而STRUCTURE算法的成功率在华人中为94%,在马来人中为80%,在印度人中为91%。这两种算法在存在混合血统时都能够提供混合比例。在没有混合血统的情况下,血统预测准确性(以似然比衡量)通常较高。错误分类发生在混血个体中,他们可能是族际通婚的后代,因此其自我报告的生物地理血统取决于其父亲,以及具有族际信仰的少数亚群体个体中。ForenSeq试剂盒上59个SNP的血统预测能力在区分新加坡华人、马来人和印度亚群体方面相当有效,将可用于调查目的。然而,通过评估其他AIM集有可能实现更准确的预测。