Suppr超能文献

单倍型关联分析和逻辑回归识别相关易感基因座能力的研究。

Investigation of the ability of haplotype association and logistic regression to identify associated susceptibility loci.

作者信息

North B V, Sham P C, Knight J, Martin E R, Curtis D

机构信息

Institute of Cancer Research, 15 Cotswold Road, Belmont, Sutton, Surrey SM2 5NG, UK.

出版信息

Ann Hum Genet. 2006 Nov;70(Pt 6):893-906. doi: 10.1111/j.1469-1809.2006.00301.x.

Abstract

While finely spaced markers are increasingly being used in case-control association studies in attempts to identify susceptibility loci, not enough is yet known as to the optimal spacing of such markers, their likely power to detect association, the relative merits of single marker versus multimarker analysis, or which methods of analysis may be optimal. Some investigations of these issues have used markers simulated under different theoretical models of population evolution. However the HapMap project and other sources provide real datasets which can be used to obtain a more realistic view of the performance of these approaches. SNPs around APOE and from two HapMap regions were used to obtain information regarding linkage disequilibrium (LD) relationships between polymorphisms, and these real patterns of LD were used to simulate datasets such as would be obtained in case-control studies were these SNPs to influence susceptibility to disease. The datasets obtained were analysed using tests for heterogeneity of estimated haplotype frequencies and using logistic regression analyses in which only main effects from each marker were considered. All markers surrounding the putative susceptibility locus were analysed, using sets of either 1, 2, 3 or 4 markers at a time. Some markers within 150 kb of the susceptibility locus were able to detect association. At distances less than 100 kb there was no correlation between the distance from the susceptibility locus and the strength of evidence for association. When the average inter-locus spacing is 25 kb many loci would not be detected, while when the spacing is as low as 2 kb one can be fairly confident that at least one marker will be in strong enough LD with the susceptibility locus to enable association to be detected, if the susceptibility locus has a strong enough effect relative to the sample size. With an inter-locus spacing of 4 kb some susceptibility loci did not have a marker locus in strong LD, potentially undermining the ability to detect association. There was little difference in the performance of haplotype-based analysis compared with logistic regression considering effects of each marker as separate. Multimarker analysis on occasion produced results which were much more highly significant than single marker analysis, but only very rarely. Our results support the view that if markers are randomly selected then a spacing as low as 2 kb is desirable. Multimarker analysis can sometimes be more powerful than single marker analysis so both should be performed. However, because it is rare for multimarker analysis to be much more highly significant than single marker analysis one should strongly suspect that when such results occur they may be due to mistakes in genotyping or through some other artefact. Haplotype analysis may be more prone to such problems than logistic regression, suggesting that the latter method might be preferred.

摘要

尽管在病例对照关联研究中,人们越来越多地使用间距精细的标记物来试图识别易感基因座,但对于此类标记物的最佳间距、它们检测关联的可能效能、单标记物分析与多标记物分析的相对优点,或者哪种分析方法可能是最优的,目前所知仍不够充分。对这些问题的一些研究使用了在不同种群进化理论模型下模拟的标记物。然而,国际人类基因组单体型图计划(HapMap计划)及其他来源提供了真实数据集,可用于更现实地了解这些方法的性能。使用载脂蛋白E(APOE)周围以及来自两个HapMap区域的单核苷酸多态性(SNP)来获取有关多态性之间连锁不平衡(LD)关系的信息,并利用这些真实的LD模式来模拟数据集,就像在病例对照研究中如果这些SNP影响疾病易感性将会获得的数据集一样。对获得的数据集进行分析时,使用了估计单倍型频率的异质性检验,并使用了逻辑回归分析,其中仅考虑每个标记物的主效应。每次使用1、2、3或4个标记物的组合,对假定易感基因座周围的所有标记物进行分析。易感基因座150 kb范围内的一些标记物能够检测到关联。在距离小于100 kb时,与易感基因座的距离和关联证据强度之间没有相关性。当基因座间平均间距为25 kb时,许多基因座将无法被检测到,而当间距低至2 kb时,如果易感基因座相对于样本量有足够强的效应,人们可以相当有信心地认为至少有一个标记物与易感基因座的LD足够强,从而能够检测到关联。当基因座间间距为4 kb时,一些易感基因座没有处于强LD状态的标记基因座,这可能会削弱检测关联的能力。与将每个标记物的效应分别考虑的逻辑回归相比,基于单倍型的分析性能差异不大。多标记物分析偶尔会产生比单标记物分析显著得多的结果,但这种情况非常罕见。我们的结果支持这样一种观点,即如果随机选择标记物,那么低至2 kb的间距是可取的。多标记物分析有时可能比单标记物分析更有效力,因此两者都应进行。然而,由于多标记物分析比单标记物分析显著得多的情况很少见,所以当出现这样的结果时,人们应该强烈怀疑它们可能是由于基因分型错误或其他人为因素造成的。单倍型分析可能比逻辑回归更容易出现此类问题,这表明后一种方法可能更可取。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验