Asimit Jennifer L, Yoo Yun Joo, Waggott Daryl, Sun Lei, Bull Shelley B
Samuel Lunenfeld Research Institute of Mount Sinai Hospital, 60 Murray Street, Box 18, Toronto, Ontario M5T 3L9, Canada.
Dalla Lana School of Public Health, University of Toronto, 155 College Street, Toronto M5T 3M7, Canada.
BMC Proc. 2009 Dec 15;3 Suppl 7(Suppl 7):S127. doi: 10.1186/1753-6561-3-s7-s127.
Due to the high-dimensionality of single-nucleotide polymorphism (SNP) data, region-based methods are an attractive approach to the identification of genetic variation associated with a certain phenotype. A common approach to defining regions is to identify the most significant SNPs from a single-SNP association analysis, and then use a gene database to obtain a list of genes proximal to the identified SNPs. Alternatively, regions may be defined statistically, via a scan statistic. After categorizing SNPs as significant or not (based on the single-SNP association p-values), a scan statistic is useful to identify regions that contain more significant SNPs than expected by chance. Important features of this method are that regions are defined statistically, so that there is no dependence on a gene database, and both gene and inter-gene regions can be detected. In the analysis of blood-lipid phenotypes from the Framingham Heart Study (FHS), we compared statistically defined regions with those formed from the top single SNP tests. Although we missed a number of single SNPs, we also identified many additional regions not found as SNP-database regions and avoided issues related to region definition. In addition, analyses of candidate genes for high-density lipoprotein, low-density lipoprotein, and triglyceride levels suggested that associations detected with region-based statistics are also found using the scan statistic approach.
由于单核苷酸多态性(SNP)数据的高维性,基于区域的方法是识别与特定表型相关的基因变异的一种有吸引力的方法。定义区域的常用方法是从单 SNP 关联分析中识别出最显著的 SNP,然后使用基因数据库获取与已识别 SNP 相邻的基因列表。或者,可以通过扫描统计量从统计学角度定义区域。在将 SNP 分类为显著或不显著(基于单 SNP 关联 p 值)之后,扫描统计量有助于识别那些包含比随机预期更多显著 SNP 的区域。该方法的重要特点是区域是从统计学角度定义的,因此不依赖于基因数据库,并且基因区域和基因间区域都可以被检测到。在对弗雷明汉心脏研究(FHS)中的血脂表型进行分析时,我们将从统计学角度定义的区域与由顶级单 SNP 检验形成的区域进行了比较。尽管我们遗漏了一些单 SNP,但我们也识别出了许多未作为 SNP 数据库区域发现的额外区域,并避免了与区域定义相关的问题。此外,对高密度脂蛋白、低密度脂蛋白和甘油三酯水平的候选基因分析表明,使用扫描统计量方法也能发现基于区域统计量检测到的关联。