Phuong Tu Minh, Lin Zhen, Altman Russ B
Department of Information Technology, Post & Telecom. Institute of Technology, Hanoi, Vietnam.
Proc IEEE Comput Syst Bioinform Conf. 2005:301-9. doi: 10.1109/csb.2005.22.
A major challenge for genomewide disease association studies is the high cost of genotyping large number of single nucleotide polymorphisms (SNP). The correlations between SNPs, however, make it possible to select a parsimonious set of informative SNPs, known as "tagging" SNPs, able to capture most variation in a population. Considerable research interest has recently focused on the development of methods for finding such SNPs. In this paper, we present an efficient method for finding tagging SNPs. The method does not involve computation-intensive search for SNP subsets but discards redundant SNPs using a feature selection algorithm. In contrast to most existing methods, the method presented here does not limit itself to using only correlations between SNPs in local groups. By using correlations that occur across different chromosomal regions, the method can reduce the number of globally redundant SNPs. Experimental results show that the number of tagging SNPs selected by our method is smaller than by using block-based methods.
全基因组疾病关联研究面临的一个主要挑战是对大量单核苷酸多态性(SNP)进行基因分型的成本高昂。然而,SNP之间的相关性使得选择一组简约的信息丰富的SNP(即所谓的“标签”SNP)成为可能,这些SNP能够捕获群体中的大部分变异。最近,大量研究兴趣集中在寻找此类SNP的方法开发上。在本文中,我们提出了一种寻找标签SNP的有效方法。该方法不涉及对SNP子集进行计算密集型搜索,而是使用特征选择算法丢弃冗余SNP。与大多数现有方法不同,本文提出的方法不仅限于使用局部组内SNP之间的相关性。通过使用不同染色体区域间出现的相关性,该方法可以减少全局冗余SNP的数量。实验结果表明,我们的方法选择的标签SNP数量比基于模块的方法更少。