Department of Statistics, University of Washington, Seattle, Washington 98195-4322, USA.
Genetics. 2012 Apr;190(4):1447-60. doi: 10.1534/genetics.111.137570. Epub 2012 Jan 31.
In both pedigree linkage studies and in population-based association studies there has been much interest in the use of modern dense genetic marker data to infer segments of gene identity by descent (ibd) among individuals not known to be related, to increase power and resolution in localizing genes affecting complex traits. In this article, we present a hidden Markov model (HMM) for ibd among a set of chromosomes and describe methods and software for inference of ibd among the four chromosomes of pairs of individuals, using either phased (haplotypic) or unphased (genotypic) data. The model allows for missing data and typing error, but does not model linkage disequilibrium (LD), because fitting an accurate LD model requires large samples from well-studied populations. However, LD remains a major confounding factor, since LD is itself a reflection of coancestry at the population level. To study the impact of LD, we have developed a novel simulation approach to generate realistic dense marker data for the same set of markers but at varying levels of LD. Using this approach, we present results of a study of the impact of LD on the sensitivity and specificity of our HMM model in estimating segments of ibd among sets of four chromosomes and between genotype pairs. We show that, despite not incorporating LD, our model has been quite successful in detecting segments as small as 10(6) bp (1 Mpb); we present also comparisons with fastIBD which uses an LD model in estimating ibd.
在系谱连锁研究和基于人群的关联研究中,人们一直对利用现代高密度遗传标记数据推断未知相关个体之间的基因同源性(ibd)片段很感兴趣,以提高定位影响复杂性状的基因的能力和分辨率。在本文中,我们提出了一种用于一组染色体之间的 ibd 的隐马尔可夫模型(HMM),并描述了使用相位(单倍型)或非相位(基因型)数据推断个体对的四条染色体之间的 ibd 的方法和软件。该模型允许存在缺失数据和分型错误,但不模拟连锁不平衡(LD),因为拟合准确的 LD 模型需要来自研究充分的人群的大样本。然而,LD 仍然是一个主要的混杂因素,因为 LD 本身反映了群体水平上的共同祖先。为了研究 LD 的影响,我们开发了一种新颖的模拟方法,为同一组标记生成具有不同 LD 水平的现实密集标记数据。使用这种方法,我们展示了对我们的 HMM 模型在估计四组染色体和基因型对之间的 ibd 片段的敏感性和特异性的 LD 影响的研究结果。我们表明,尽管没有包含 LD,但我们的模型在检测 10(6)bp(1Mpb)大小的片段方面非常成功;我们还展示了与 fastIBD 的比较,fastIBD 使用 LD 模型来估计 ibd。