College of Computer and Communication Engineering, China University of Petroleum (East China), Qingdao Shandong, China.
Bioinformatics. 2017 Jul 15;33(14):2078-2081. doi: 10.1093/bioinformatics/btx151.
Effective tagging single-nucleotide polymorphism (SNP)-set selection is crucial to SNP-set analysis in genome-wide association studies (GWAS). Most of the existing tagging SNP-set selection methods cannot make full use of the information hidden in common or rare variants associated diseases. It is noticed that some SNPs have overlapping genetic information owing to linkage disequilibrium (LD) structure between SNPs. Therefore, when testing the association between SNPs and disease susceptibility, it is sufficient to elect the representative SNPs (called tag SNP-set or tagSNP-set) with maximum information.
It is proposed a new tagSNP-set selection method based on LD information between SNPs, namely TagSNP-Set with Maximum Information. Compared with classical SNP-set analytical method, our method not only has higher power, but also can minimize the number of selected tagSNPs and maximize the information provided by selected tagSNPs with less genotyping cost and lower time complexity.
Supplementary data are available at Bioinformatics online.
有效的标记单核苷酸多态性(SNP)集选择对于全基因组关联研究(GWAS)中的 SNP 集分析至关重要。大多数现有的标记 SNP 集选择方法不能充分利用与疾病相关的常见或罕见变体中隐藏的信息。注意到一些 SNP 由于 SNP 之间的连锁不平衡(LD)结构而具有重叠的遗传信息。因此,在测试 SNP 与疾病易感性之间的关联时,选择具有最大信息量的代表性 SNP(称为标记 SNP 集或 tagSNP 集)就足够了。
提出了一种基于 SNP 之间 LD 信息的新的 tagSNP 集选择方法,即具有最大信息量的 TagSNP-Set。与经典的 SNP 集分析方法相比,我们的方法不仅具有更高的功效,而且可以在更少的基因分型成本和更低的时间复杂度下,最小化选择的 tagSNP 数量,并最大化选择的 tagSNP 提供的信息量。
补充数据可在 Bioinformatics 在线获得。