Suppr超能文献

遗传关联研究中多个单核苷酸多态性的分析:三种多位点方法在单核苷酸多态性优先级排序和选择方面的比较

Analysis of multiple SNPs in genetic association studies: comparison of three multi-locus methods to prioritize and select SNPs.

作者信息

Heidema A Geert, Feskens Edith J M, Doevendans Pieter A F M, Ruven Henk J T, van Houwelingen Hans C, Mariman Edwin C M, Boer Jolanda M A

机构信息

Centre for Nutrition and Health, National Institute for Public Health and the Environment, Bilthoven, The Netherlands.

出版信息

Genet Epidemiol. 2007 Dec;31(8):910-21. doi: 10.1002/gepi.20251.

Abstract

Nonparametric approaches have been developed that are able to analyze large numbers of single nucleotide polymorphisms (SNPs) in modest sample sizes. These approaches have different selection features and may not provide similar results when applied to the same dataset. Therefore, we compared the results of three approaches (set association, random forests and multifactor dimensionality reduction [MDR]) to select from a total of 93 candidate SNPs a subset of SNPs that are important in determining high-density lipoprotein (HDL)-cholesterol levels. The study population consisted of a random sample from a Dutch monitoring project for cardiovascular disease risk factors and was dichotomized into cases (low HDL-cholesterol, n = 533) and non-cases (high HDL-cholesterol, n = 545) based on gender-specific median values for HDL cholesterol. Clearly, all three approaches prioritized three SNPs as important (CETP Taq1B, CETP-629 C/A and LPL Ser447X). Two SNPs with weaker main effects were additionally prioritized by random forests (APOC3 3175 G/C and CCR2 Val62Ile), whereas MTHFR 677 C/T was selected in combination with CETP Taq1B as best model by MDR. Obtained p-values for the selected models were significant for the set association approach (p =.0019), random forests (p<.01) and MDR (p<.02). In conclusion, the application of a combination of multi-locus methods is a useful approach in genetic association studies to select a well-defined set of important SNPs for further statistical and epidemiological interpretation, providing increased confidence and more information compared with the application of only one method.

摘要

已经开发出非参数方法,能够在样本量适中的情况下分析大量单核苷酸多态性(SNP)。这些方法具有不同的选择特征,应用于同一数据集时可能不会产生相似的结果。因此,我们比较了三种方法(集合关联、随机森林和多因素降维法[MDR])的结果,以便从总共93个候选SNP中选出一组对确定高密度脂蛋白(HDL)胆固醇水平至关重要的SNP子集。研究人群是从荷兰心血管疾病危险因素监测项目中随机抽取的样本,并根据HDL胆固醇的性别特异性中位数,分为病例组(HDL胆固醇水平低,n = 533)和非病例组(HDL胆固醇水平高,n = 545)。显然,所有三种方法都将三个SNP列为重要SNP(CETP Taq1B、CETP - 629 C/A和LPL Ser447X)。随机森林法还额外将两个主效应较弱的SNP列为重要SNP(APOC3 3175 G/C和CCR2 Val62Ile),而MDR法将MTHFR 677 C/T与CETP Taq1B组合选为最佳模型。所选模型的p值对于集合关联法(p = 0.0019)、随机森林法(p < 0.01)和MDR法(p < 0.02)均具有显著性。总之,在基因关联研究中,应用多种多位点方法的组合是一种有用的方法,可用于选择一组明确的重要SNP,以便进行进一步的统计和流行病学解释,与仅应用一种方法相比,能提供更高的可信度和更多信息。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验