基于不相容性和局部完美系统发育的全基因组关联图谱绘制。

Whole genome association mapping by incompatibilities and local perfect phylogenies.

作者信息

Mailund Thomas, Besenbacher Søren, Schierup Mikkel H

机构信息

Department of Statistics, University of Oxford, UK.

出版信息

BMC Bioinformatics. 2006 Oct 16;7:454. doi: 10.1186/1471-2105-7-454.

DOI:10.1186/1471-2105-7-454

PMID:17042942

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1624851/

Abstract

BACKGROUND

With current technology, vast amounts of data can be cheaply and efficiently produced in association studies, and to prevent data analysis to become the bottleneck of studies, fast and efficient analysis methods that scale to such data set sizes must be developed.

RESULTS

We present a fast method for accurate localisation of disease causing variants in high density case-control association mapping experiments with large numbers of cases and controls. The method searches for significant clustering of case chromosomes in the "perfect" phylogenetic tree defined by the largest region around each marker that is compatible with a single phylogenetic tree. This perfect phylogenetic tree is treated as a decision tree for determining disease status, and scored by its accuracy as a decision tree. The rationale for this is that the perfect phylogeny near a disease affecting mutation should provide more information about the affected/unaffected classification than random trees. If regions of compatibility contain few markers, due to e.g. large marker spacing, the algorithm can allow the inclusion of incompatibility markers in order to enlarge the regions prior to estimating their phylogeny. Haplotype data and phased genotype data can be analysed. The power and efficiency of the method is investigated on 1) simulated genotype data under different models of disease determination 2) artificial data sets created from the HapMap ressource, and 3) data sets used for testing of other methods in order to compare with these. Our method has the same accuracy as single marker association (SMA) in the simplest case of a single disease causing mutation and a constant recombination rate. However, when it comes to more complex scenarios of mutation heterogeneity and more complex haplotype structure such as found in the HapMap data our method outperforms SMA as well as other fast, data mining approaches such as HapMiner and Haplotype Pattern Mining (HPM) despite being significantly faster. For unphased genotype data, an initial step of estimating the phase only slightly decreases the power of the method. The method was also found to accurately localise the known susceptibility variants in an empirical data set--the DeltaF508 mutation for cystic fibrosis--where the susceptibility variant is already known--and to find significant signals for association between the CYP2D6 gene and poor drug metabolism, although for this dataset the highest association score is about 60 kb from the CYP2D6 gene.

CONCLUSION

Our method has been implemented in the Blossoc (BLOck aSSOCiation) software. Using Blossoc, genome wide chip-based surveys of 3 million SNPs in 1000 cases and 1000 controls can be analysed in less than two CPU hours.

摘要

背景

利用当前技术，在关联研究中可以低成本、高效率地生成大量数据。为防止数据分析成为研究的瓶颈，必须开发能够处理如此大规模数据集的快速高效分析方法。

结果

我们提出了一种快速方法，用于在包含大量病例和对照的高密度病例对照关联图谱实验中，准确定位致病变异。该方法在由每个标记周围与单个系统发育树兼容的最大区域所定义的“完美”系统发育树中，搜索病例染色体的显著聚类。这个完美系统发育树被视为用于确定疾病状态的决策树，并根据其作为决策树的准确性进行评分。这样做的基本原理是，靠近影响疾病的突变的完美系统发育，应该比随机树提供更多关于受影响/未受影响分类的信息。如果兼容区域包含的标记很少，例如由于标记间距大，该算法可以允许纳入不兼容标记，以便在估计其系统发育之前扩大区域。单倍型数据和定相基因型数据均可分析。该方法的效能和效率通过以下方式进行研究：1）在不同疾病决定模型下的模拟基因型数据；2）根据HapMap资源创建的人工数据集；3）用于测试其他方法以便与之比较的数据集。在单个致病突变和恒定重组率的最简单情况下，我们的方法与单标记关联（SMA）具有相同的准确性。然而，在突变异质性更复杂以及单倍型结构更复杂的情况下，如在HapMap数据中发现的情况，尽管我们方法显著更快，但它优于SMA以及其他快速数据挖掘方法，如HapMiner和单倍型模式挖掘（HPM）。对于未定相的基因型数据，估计相位的初始步骤只会略微降低该方法的效能。在一个经验数据集中，该方法还被发现能够准确地定位已知的易感性变异——囊性纤维化的DeltaF508突变，其中易感性变异是已知的——并且能够找到CYP2D6基因与药物代谢不良之间关联的显著信号，尽管对于这个数据集，最高关联分数距离CYP2D6基因约60 kb。

结论

我们的方法已在Blossoc（块关联）软件中实现。使用Blossoc，在不到两个CPU小时内就可以分析1000例病例和1000例对照中300万个单核苷酸多态性（SNP）的全基因组芯片调查数据。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c70a/1624851/f4ee1dc31aab/1471-2105-7-454-1.jpg

相似文献

Whole genome association mapping by incompatibilities and local perfect phylogenies.

BMC Bioinformatics. 2006 Oct 16;7:454. doi: 10.1186/1471-2105-7-454.

Genetic association mapping via evolution-based clustering of haplotypes.

PLoS Genet. 2007 Jul;3(7):e111. doi: 10.1371/journal.pgen.0030111.

Effects of single SNPs, haplotypes, and whole-genome LD maps on accuracy of association mapping.

Genet Epidemiol. 2007 Apr;31(3):179-88. doi: 10.1002/gepi.20199.

Haplotype-based quantitative trait mapping using a clustering algorithm.

BMC Bioinformatics. 2006 May 18;7:258. doi: 10.1186/1471-2105-7-258.

Linkage disequilibrium mapping identifies a 390 kb region associated with CYP2D6 poor drug metabolising activity.

Pharmacogenomics J. 2002;2(3):165-75. doi: 10.1038/sj.tpj.6500096.

Efficient whole-genome association mapping using local phylogenies for unphased genotype data.

Bioinformatics. 2008 Oct 1;24(19):2215-21. doi: 10.1093/bioinformatics/btn406. Epub 2008 Jul 30.

Data mining applied to linkage disequilibrium mapping.

Am J Hum Genet. 2000 Jul;67(1):133-45. doi: 10.1086/302954. Epub 2000 Jun 9.

Multipoint linkage-disequilibrium mapping narrows location interval and identifies mutation heterogeneity.

Proc Natl Acad Sci U S A. 2003 Nov 11;100(23):13442-6. doi: 10.1073/pnas.2235031100. Epub 2003 Nov 3.

FastTagger: an efficient algorithm for genome-wide tag SNP selection using multi-marker linkage disequilibrium.

BMC Bioinformatics. 2010 Jan 29;11:66. doi: 10.1186/1471-2105-11-66.

A spatial probit model for fine-scale mapping of disease genes.

Genet Epidemiol. 2007 Apr;31(3):252-60. doi: 10.1002/gepi.20206.

引用本文的文献

Tree-based QTL mapping with expected local genetic relatedness matrices.

Am J Hum Genet. 2023 Dec 7;110(12):2077-2091. doi: 10.1016/j.ajhg.2023.10.017.

Tree-based QTL mapping with expected local genetic relatedness matrices.

bioRxiv. 2023 Apr 8:2023.04.07.536093. doi: 10.1101/2023.04.07.536093.

perfectphyloR: An R package for reconstructing perfect phylogenies.

BMC Bioinformatics. 2019 Dec 23;20(1):729. doi: 10.1186/s12859-019-3313-4.

Genome-wide compatible SNP intervals and their properties.

ACM Int Conf Bioinform Comput Biol (2010). 2010 Aug;2010:43-52. doi: 10.1145/1854776.1854788.

Comparing performance of non-tree-based and tree-based association mapping methods.

BMC Proc. 2016 Oct 18;10(Suppl 7):405-410. doi: 10.1186/s12919-016-0063-4. eCollection 2016.

New Genetic Approaches to AD: Lessons from APOE-TOMM40 Phylogenetics.

Curr Neurol Neurosci Rep. 2016 May;16(5):48. doi: 10.1007/s11910-016-0643-8.

Gene genealogies for genetic association mapping, with application to Crohn's disease.

Front Genet. 2013 Dec 2;4:260. doi: 10.3389/fgene.2013.00260. eCollection 2013.

Using ancestral information to detect and localize quantitative trait loci in genome-wide association studies.

BMC Bioinformatics. 2013 Jun 20;14:200. doi: 10.1186/1471-2105-14-200.

The effect of using genealogy-based haplotypes for genomic prediction.

Genet Sel Evol. 2013 Mar 6;45(1):5. doi: 10.1186/1297-9686-45-5.

Comparison of linear mixed model analysis and genealogy-based haplotype clustering with a Bayesian approach for association mapping in a pedigreed population.

BMC Proc. 2012 May 21;6 Suppl 2(Suppl 2):S4. doi: 10.1186/1753-6561-6-S2-S4.

本文引用的文献

TreeDT: tree pattern mining for gene mapping.

IEEE/ACM Trans Comput Biol Bioinform. 2006 Apr-Jun;3(2):174-85. doi: 10.1109/TCBB.2006.28.

Evaluating coverage of genome-wide association studies.

Nat Genet. 2006 Jun;38(6):659-62. doi: 10.1038/ng1801. Epub 2006 May 21.

Evaluating and improving power in whole-genome association studies using fixed marker sets.

Nat Genet. 2006 Jun;38(6):663-7. doi: 10.1038/ng1816. Epub 2006 May 21.

A genome-wide association study of nonsynonymous SNPs identifies a type 1 diabetes locus in the interferon-induced helicase (IFIH1) region.

Nat Genet. 2006 Jun;38(6):617-9. doi: 10.1038/ng1800. Epub 2006 May 14.

A common variant associated with prostate cancer in European and African populations.

Nat Genet. 2006 Jun;38(6):652-8. doi: 10.1038/ng1808. Epub 2006 May 7.

A common genetic variant in the NOS1 regulator NOS1AP modulates cardiac repolarization.

Nat Genet. 2006 Jun;38(6):644-51. doi: 10.1038/ng1790. Epub 2006 Apr 30.

GeneRecon--a coalescent based tool for fine-scale association mapping.

Bioinformatics. 2006 Sep 15;22(18):2317-8. doi: 10.1093/bioinformatics/btl153. Epub 2006 Apr 21.

A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase.

Am J Hum Genet. 2006 Apr;78(4):629-44. doi: 10.1086/502802. Epub 2006 Feb 17.

Joint analysis is more efficient than replication-based analysis for two-stage genome-wide association studies.

Nat Genet. 2006 Feb;38(2):209-13. doi: 10.1038/ng1706. Epub 2006 Jan 15.

Fine mapping of disease genes via haplotype clustering.

Genet Epidemiol. 2006 Feb;30(2):170-9. doi: 10.1002/gepi.20134.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于不相容性和局部完美系统发育的全基因组关联图谱绘制。

Whole genome association mapping by incompatibilities and local perfect phylogenies.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献