候选基因区域中的单核苷酸多态性、单倍型及模型选择：多位点数据的简单分析

SNPs, haplotypes, and model selection in a candidate gene region: the SIMPle analysis for multilocus data.

作者信息

Conti David V, Gauderman W James

机构信息

Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90033, USA.

出版信息

Genet Epidemiol. 2004 Dec;27(4):429-41. doi: 10.1002/gepi.20039.

DOI:10.1002/gepi.20039

PMID:15543635

Abstract

Modern molecular techniques make discovery of numerous single nucleotide polymorphims (SNPs) in candidate gene regions feasible. Conventional analysis relies on either independent tests with each variant or the use of haplotypes in association analysis. The first technique ignores the dependencies between SNPs. The second, though it may increase power, often introduces uncertainty by estimating haplotypes from population data. Additionally, as the number of loci expands for a haplotype, ambiguity in interpretation increases for determining the underlying genetic components driving a detected association. Here, we present a genotype-level analysis to jointly model the SNPs via a SNP interaction model with phase information (SIMPle) to capture the underlying haplotype structure. This analysis estimates both the risk associated with each variant and the importance of phase between pairwise combinations of SNPs. Thus, rather than selecting between genotype- or haplotype-level approaches, the SIMPle method frames the analysis of multilocus data in a model selection paradigm, the aim to determine which SNPs, phase terms, and linear combinations best describe the relation between genetic variation and a trait of interest. To avoid unstable estimation due to sparse data and to incorporate both the dependencies among terms and the uncertainty in model selection, we propose a Bayes model averaging procedure. This highlights key SNPs and phase terms and yields a set of best representative models. Using simulations, we demonstrate the utility of the SIMPle model to identify crucial SNPs and underlying haplotype structures across a variety of causal models and genetic architectures.

摘要

现代分子技术使在候选基因区域发现众多单核苷酸多态性（SNP）成为可能。传统分析要么依赖于对每个变异进行独立测试，要么在关联分析中使用单倍型。第一种技术忽略了SNP之间的依赖性。第二种技术虽然可能会提高检验效能，但往往通过从群体数据中估计单倍型而引入不确定性。此外，随着单倍型位点数量的增加，在确定驱动检测到的关联的潜在遗传成分时，解释的模糊性也会增加。在此，我们提出一种基因型水平分析方法，通过带有相位信息的SNP相互作用模型（SIMPle）对SNP进行联合建模，以捕捉潜在的单倍型结构。该分析既估计了与每个变异相关的风险，也估计了SNP成对组合之间相位的重要性。因此，SIMPle方法并非在基因型或单倍型水平方法之间进行选择，而是在模型选择范式中构建多位点数据的分析，目的是确定哪些SNP、相位项和线性组合最能描述遗传变异与感兴趣性状之间的关系。为避免因数据稀疏导致的估计不稳定，并纳入项之间的依赖性和模型选择中的不确定性，我们提出一种贝叶斯模型平均程序。这突出了关键的SNP和相位项，并产生一组最佳代表性模型。通过模拟，我们证明了SIMPle模型在识别各种因果模型和遗传结构中的关键SNP和潜在单倍型结构方面的效用。