基于高效连锁不平衡隐马尔可夫模型的 SNP 数据全基因组 IBD 共享估计及其在基因定位中的应用。

Estimating genome-wide IBD sharing from SNP data via an efficient hidden Markov model of LD with application to gene mapping.

机构信息

Technion-Israel Institute of Technology, Computer Science Department Haifa, 32000 Israel.

出版信息

Bioinformatics. 2010 Jun 15;26(12):i175-82. doi: 10.1093/bioinformatics/btq204.

DOI:10.1093/bioinformatics/btq204

PMID:20529903

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2881389/

Abstract

MOTIVATION

Association analysis is the method of choice for studying complex multifactorial diseases. The premise of this method is that affected persons contain some common genomic regions with similar SNP alleles and such areas will be found in this analysis. An important disadvantage of GWA studies is that it does not distinguish between genomic areas that are inherited from a common ancestor [identical by descent (IBD)] and areas that are identical merely by state [identical by state (IBS)]. Clearly, areas that can be marked with higher probability as IBD and have the same correlation with the disease status of identical areas that are more probably only IBS, are better candidates to be causative, and yet this distinction is not encoded in standard association analysis.

RESULTS

We develop a factorial hidden Markov model-based algorithm for computing genome-wide IBD sharing. The algorithm accepts as input SNP data of measured individuals and estimates the probability of IBD at each locus for every pair of individuals. For two g-degree relatives, when g > or = 8, the computation yields a precision of IBD tagging of over 50% higher than previous methods for 95% recall. Our algorithm uses a first-order Markovian model for the linkage disequilibrium process and employs a reduction of the state space of the inheritance vector from being exponential in g to quadratic. The higher accuracy along with the reduced time complexity marks our method as a feasible means for IBD mapping in practical scenarios.

AVAILABILITY

A software implementation, called IBDMAP, is freely available at http://bioinfo.cs.technion.ac.il/IBDmap.

摘要

动机

关联分析是研究复杂多因素疾病的首选方法。该方法的前提是受影响的个体包含一些具有相似 SNP 等位基因的常见基因组区域，并且这些区域将在该分析中找到。GWAS 研究的一个重要缺点是，它无法区分来自共同祖先的基因组区域（同源同系）和仅通过状态相同的区域（同源同型）。显然，可以更有可能标记为同源同系的区域，并且与同源同型区域的疾病状态具有相同的相关性，因此更有可能是致病的，但是这种区别在标准关联分析中没有编码。

结果

我们开发了一种基于因子隐马尔可夫模型的算法，用于计算全基因组 IBD 共享。该算法接受测量个体的 SNP 数据作为输入，并为每个个体对计算每个位置的 IBD 概率。对于两个 g 度亲属，当 g≥8 时，与以前的方法相比，该算法的计算结果在 95%的召回率下，IBD 标记的精度提高了 50%以上。我们的算法使用一阶马尔可夫模型来模拟连锁不平衡过程，并采用将遗传向量的状态空间从与 g 呈指数关系减少到二次关系。更高的准确性和降低的时间复杂度标志着我们的方法在实际场景中进行 IBD 映射是一种可行的手段。

可用性

一个名为 IBDMAP 的软件实现可在 http://bioinfo.cs.technion.ac.il/IBDmap 上免费获得。

相似文献

Estimating genome-wide IBD sharing from SNP data via an efficient hidden Markov model of LD with application to gene mapping.基于高效连锁不平衡隐马尔可夫模型的 SNP 数据全基因组 IBD 共享估计及其在基因定位中的应用。

Bioinformatics. 2010 Jun 15;26(12):i175-82. doi: 10.1093/bioinformatics/btq204.

Genet Epidemiol. 2009 Apr;33(3):266-74. doi: 10.1002/gepi.20378.

Efficient identification of identical-by-descent status in pedigrees with many untyped individuals.高效鉴定具有大量未分型个体的家系中同系同源状态。

Bioinformatics. 2010 Jun 15;26(12):i191-8. doi: 10.1093/bioinformatics/btq222.

FastTagger: an efficient algorithm for genome-wide tag SNP selection using multi-marker linkage disequilibrium.FastTagger：一种利用多标记连锁不平衡进行全基因组标签 SNP 选择的高效算法。

BMC Bioinformatics. 2010 Jan 29;11:66. doi: 10.1186/1471-2105-11-66.

Combinatorial Conflicting Homozygosity (CCH) analysis enables the rapid identification of shared genomic regions in the presence of multiple phenocopies.组合冲突纯合性（CCH）分析能够在存在多种表型模拟的情况下快速识别共享基因组区域。

BMC Genomics. 2015 Mar 10;16(1):163. doi: 10.1186/s12864-015-1360-4.

Inference of relationships in population data using identity-by-descent and identity-by-state.利用血缘关系和基因状态推断群体数据中的关系。

PLoS Genet. 2011 Sep;7(9):e1002287. doi: 10.1371/journal.pgen.1002287. Epub 2011 Sep 22.

Genomic selection and complex trait prediction using a fast EM algorithm applied to genome-wide markers.基因组选择和复杂性状预测使用快速 EM 算法应用于全基因组标记。

BMC Bioinformatics. 2010 Oct 22;11:529. doi: 10.1186/1471-2105-11-529.

A novel bayesian graphical model for genome-wide multi-SNP association mapping.一种用于全基因组多 SNP 关联作图的新型贝叶斯图形模型。

Genet Epidemiol. 2012 Jan;36(1):36-47. doi: 10.1002/gepi.20661. Epub 2011 Nov 29.

A method for detecting IBD regions simultaneously in multiple individuals--with applications to disease genetics.一种同时检测多个个体中 IBD 区域的方法 - 应用于疾病遗传学。

Genome Res. 2011 Jul;21(7):1168-80. doi: 10.1101/gr.115360.110. Epub 2011 Apr 14.

Integration of SNP genotyping confidence scores in IBD inference.SNP 基因分型置信分数在 IBD 推断中的整合。

Bioinformatics. 2011 Oct 15;27(20):2880-7. doi: 10.1093/bioinformatics/btr486. Epub 2011 Aug 23.

引用本文的文献

Accurate detection of identity-by-descent segments in human ancient DNA.准确检测人类古代 DNA 中的同源片段。

Nat Genet. 2024 Jan;56(1):143-151. doi: 10.1038/s41588-023-01582-w. Epub 2023 Dec 20.

Integrating omics for a better understanding of Inflammatory Bowel Disease: a step towards personalized medicine.整合组学以更好地理解炎症性肠病：迈向个体化医学的一步。

J Transl Med. 2019 Dec 13;17(1):419. doi: 10.1186/s12967-019-02174-1.

RaPID: ultra-fast, powerful, and accurate detection of segments identical by descent (IBD) in biobank-scale cohorts.RaPID：在生物库规模队列中快速、强大且准确地检测由同源片段（IBD）

Genome Biol. 2019 Jul 25;20(1):143. doi: 10.1186/s13059-019-1754-8.

Quantification of transplant-derived circulating cell-free DNA in absence of a donor genotype.在缺乏供体基因型的情况下对移植来源的循环游离DNA进行定量分析。

PLoS Comput Biol. 2017 Aug 3;13(8):e1005629. doi: 10.1371/journal.pcbi.1005629. eCollection 2017 Aug.

An efficient method to handle the 'large p, small n' problem for genomewide association studies using Haseman-Elston regression.一种使用哈斯曼-埃尔斯顿回归处理全基因组关联研究中“大p，小n”问题的有效方法。

J Genet. 2016 Dec;95(4):847-852. doi: 10.1007/s12041-016-0705-3.

PRIMAL: Fast and accurate pedigree-based imputation from sequence data in a founder population.PRIMAL：在创始人群体中基于系谱从序列数据进行快速准确的填充。

PLoS Comput Biol. 2015 Mar 3;11(3):e1004139. doi: 10.1371/journal.pcbi.1004139. eCollection 2015 Mar.

Nat Rev Genet. 2015 Jan;16(1):33-44. doi: 10.1038/nrg3821. Epub 2014 Nov 18.

Parente2: a fast and accurate method for detecting identity by descent.Parente2：一种快速且准确的检测同源性的方法。

Genome Res. 2015 Feb;25(2):280-9. doi: 10.1101/gr.173641.114. Epub 2014 Oct 1.

Using haplotypes for the prediction of allelic identity to fine-map QTL: characterization and properties.利用单倍型预测等位基因同一性以精细定位数量性状基因座：特征与特性

Genet Sel Evol. 2014 Jul 14;46(1):45. doi: 10.1186/1297-9686-46-45.

An effective filter for IBD detection in large data sets.一种用于在大数据集中检测炎症性肠病的有效过滤器。

PLoS One. 2014 Mar 25;9(3):e92713. doi: 10.1371/journal.pone.0092713. eCollection 2014.

本文引用的文献

Inferring ancestries efficiently in admixed populations with linkage disequilibrium.在存在连锁不平衡的混合群体中高效推断祖先血统。

J Comput Biol. 2009 Aug;16(8):1141-50. doi: 10.1089/cmb.2009.0105.

Speeding up HMM algorithms for genetic linkage analysis via chain reductions of the state space.通过状态空间的链约简加速用于遗传连锁分析的隐马尔可夫模型算法。

Bioinformatics. 2009 Jun 15;25(12):i196-203. doi: 10.1093/bioinformatics/btp224.

Rapid and accurate multiple testing correction and power estimation for millions of correlated markers.针对数百万个相关标记物进行快速准确的多重检验校正和效能估计。

PLoS Genet. 2009 Apr;5(4):e1000456. doi: 10.1371/journal.pgen.1000456. Epub 2009 Apr 17.

Maximizing power in association studies.关联研究中的效能最大化。

Nat Biotechnol. 2009 Mar;27(3):255-6. doi: 10.1038/nbt0309-255.

Increasing power in association studies by using linkage disequilibrium structure and molecular function as prior information.通过将连锁不平衡结构和分子功能作为先验信息来提高关联研究的效能。

Genome Res. 2008 Apr;18(4):653-60. doi: 10.1101/gr.072785.107. Epub 2008 Mar 18.

Estimation of the multiple testing burden for genomewide association studies of nearly all common variants.几乎所有常见变异的全基因组关联研究的多重检验负担估计。

Genet Epidemiol. 2008 May;32(4):381-5. doi: 10.1002/gepi.20303.

A second generation human haplotype map of over 3.1 million SNPs.一张包含超过310万个单核苷酸多态性的第二代人类单倍型图谱。

Nature. 2007 Oct 18;449(7164):851-61. doi: 10.1038/nature06258.

PLINK: a tool set for whole-genome association and population-based linkage analyses.PLINK：一个用于全基因组关联分析和基于群体的连锁分析的工具集。

Am J Hum Genet. 2007 Sep;81(3):559-75. doi: 10.1086/519795. Epub 2007 Jul 25.

Evaluating and improving power in whole-genome association studies using fixed marker sets.使用固定标记集评估和提高全基因组关联研究的效能

Nat Genet. 2006 Jun;38(6):663-7. doi: 10.1038/ng1816. Epub 2006 May 21.

Online system for faster multipoint linkage analysis via parallel execution on thousands of personal computers.通过在数千台个人计算机上并行执行实现更快多点连锁分析的在线系统。

Am J Hum Genet. 2006 Jun;78(6):922-35. doi: 10.1086/504158. Epub 2006 May 1.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于高效连锁不平衡隐马尔可夫模型的 SNP 数据全基因组 IBD 共享估计及其在基因定位中的应用。

Estimating genome-wide IBD sharing from SNP data via an efficient hidden Markov model of LD with application to gene mapping.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

动机

结果

可用性

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献