Technion-Israel Institute of Technology, Computer Science Department Haifa, 32000 Israel.
Bioinformatics. 2010 Jun 15;26(12):i175-82. doi: 10.1093/bioinformatics/btq204.
Association analysis is the method of choice for studying complex multifactorial diseases. The premise of this method is that affected persons contain some common genomic regions with similar SNP alleles and such areas will be found in this analysis. An important disadvantage of GWA studies is that it does not distinguish between genomic areas that are inherited from a common ancestor [identical by descent (IBD)] and areas that are identical merely by state [identical by state (IBS)]. Clearly, areas that can be marked with higher probability as IBD and have the same correlation with the disease status of identical areas that are more probably only IBS, are better candidates to be causative, and yet this distinction is not encoded in standard association analysis.
We develop a factorial hidden Markov model-based algorithm for computing genome-wide IBD sharing. The algorithm accepts as input SNP data of measured individuals and estimates the probability of IBD at each locus for every pair of individuals. For two g-degree relatives, when g > or = 8, the computation yields a precision of IBD tagging of over 50% higher than previous methods for 95% recall. Our algorithm uses a first-order Markovian model for the linkage disequilibrium process and employs a reduction of the state space of the inheritance vector from being exponential in g to quadratic. The higher accuracy along with the reduced time complexity marks our method as a feasible means for IBD mapping in practical scenarios.
A software implementation, called IBDMAP, is freely available at http://bioinfo.cs.technion.ac.il/IBDmap.
关联分析是研究复杂多因素疾病的首选方法。该方法的前提是受影响的个体包含一些具有相似 SNP 等位基因的常见基因组区域,并且这些区域将在该分析中找到。GWAS 研究的一个重要缺点是,它无法区分来自共同祖先的基因组区域(同源同系)和仅通过状态相同的区域(同源同型)。显然,可以更有可能标记为同源同系的区域,并且与同源同型区域的疾病状态具有相同的相关性,因此更有可能是致病的,但是这种区别在标准关联分析中没有编码。
我们开发了一种基于因子隐马尔可夫模型的算法,用于计算全基因组 IBD 共享。该算法接受测量个体的 SNP 数据作为输入,并为每个个体对计算每个位置的 IBD 概率。对于两个 g 度亲属,当 g≥8 时,与以前的方法相比,该算法的计算结果在 95%的召回率下,IBD 标记的精度提高了 50%以上。我们的算法使用一阶马尔可夫模型来模拟连锁不平衡过程,并采用将遗传向量的状态空间从与 g 呈指数关系减少到二次关系。更高的准确性和降低的时间复杂度标志着我们的方法在实际场景中进行 IBD 映射是一种可行的手段。
一个名为 IBDMAP 的软件实现可在 http://bioinfo.cs.technion.ac.il/IBDmap 上免费获得。