Setsirichok Damrongrit, Piroonratana Theera, Assawamakin Anunchai, Usavanarong Touchpong, Limwongse Chanin, Wongseree Waranyu, Aporntewan Chatchawit, Chaiyaratana Nachol
Department of Electrical Engineering, Faculty of Engineering, King Mongkut's University of Technology North Bangkok, 1518 Piboolsongkram Road, Bangsue, Bangkok 10800, Thailand.
Int J Data Min Bioinform. 2012;6(6):651-74. doi: 10.1504/ijdmb.2012.050249.
A protocol for the identification of Ancestry Informative Markers (AIMs) from genome-wide Single Nucleotide Polymorphism (SNP) data is proposed. The protocol consists of three main steps: identification of potential positive selection regions via F(ST) extremity measurement, SNP screening via two-stage attribute selection and classification model construction using a Naïve Bayes classifier. The two-stage attribute selection is composed of a newly developed round robin Symmetrical Uncertainty (SU) ranking technique and a wrapper embedded with a Naïve Bayes classifier. The protocol has been applied to the HapMap Phase II data. Two AIM panels, which consist of 10 and 16 SNPs that lead to complete classification between CEU, CHB, JPT and YRI populations, are identified. Moreover, the panels are at least four times smaller than those reported in previous studies. The results suggest that the protocol could be useful in a scenario involving a larger number of populations.
本文提出了一种从全基因组单核苷酸多态性(SNP)数据中识别祖先信息标记(AIM)的方案。该方案包括三个主要步骤:通过F(ST)极值测量识别潜在的正选择区域,通过两阶段属性选择进行SNP筛选,以及使用朴素贝叶斯分类器构建分类模型。两阶段属性选择由新开发的循环对称不确定性(SU)排序技术和嵌入朴素贝叶斯分类器的包装器组成。该方案已应用于HapMap二期数据。识别出了两个AIM面板,分别由10个和16个SNP组成,这些SNP可实现CEU、CHB、JPT和YRI人群之间的完全分类。此外,这些面板比先前研究中报道的面板至少小四倍。结果表明,该方案在涉及更多人群的情况下可能会有用。