Cancer Research UK Genetic Epidemiology Division, University of Leeds, Cancer Genetics Building, St James's University Hospital, Beckett Street, Leeds LS9 7TF, UK.
BMC Genet. 2005 Dec 30;6 Suppl 1(Suppl 1):S72. doi: 10.1186/1471-2156-6-S1-S72.
In genetic association studies, linkage disequilibrium (LD) within a region can be exploited to select a subset of single-nucleotide polymorphisms (SNPs) to genotype with minimal loss of information. A novel entropy-based method for selecting SNPs is proposed and compared to an existing method based on the coefficient of determination (R2) using simulated data from Genetic Analysis Workshop 14. The effect of the size of the sample used to investigate LD (by estimating haplotype frequencies) and hence select the SNPs is also investigated for both measures. It is found that the novel method and the established method select SNP subsets that do not differ greatly. The entropy-based measure may thus have value because it is easier to compute than R2. Increasing the sample size used to estimate haplotype frequencies improves the predictive power of the subset of SNPs selected. A smaller subset of SNPs chosen using a large initial sample to estimate LD can in some instances be more informative than a larger subset chosen based on poor estimates of LD (using a small initial sample). An initial sample size of 50 individuals is sufficient in most situations investigated, which involved selection from a set of 7 SNPs, although to select a larger number of SNPs, a larger initial sample size may be required.
在遗传关联研究中,可以利用区域内的连锁不平衡(LD)来选择一组单核苷酸多态性(SNP)进行基因分型,以最小化信息丢失。本文提出了一种基于熵的新方法来选择 SNP,并与基于决定系数(R2)的现有方法进行了比较,使用来自遗传分析研讨会 14 的模拟数据。还研究了用于研究 LD(通过估计单倍型频率)并因此选择 SNP 的样本大小的影响,这两种方法都进行了研究。结果发现,新方法和已建立的方法选择的 SNP 子集没有太大差异。因此,基于熵的度量可能具有价值,因为它比 R2 更容易计算。增加用于估计单倍型频率的样本量可以提高所选 SNP 子集的预测能力。在某些情况下,使用较大的初始样本估计 LD 选择的较小 SNP 子集可能比基于 LD 较差估计(使用较小的初始样本)选择的较大 SNP 子集更具信息量。在大多数情况下,初始样本量为 50 人就足够了,这涉及从一组 7 个 SNP 中选择,尽管要选择更多的 SNP,可能需要更大的初始样本量。