基于HapMap数据生成用于关联研究的样本。

Generating samples for association studies based on HapMap data.

作者信息

Li Jing, Chen Yixuan

机构信息

Electrical Engineering and Computer Science Department, Case Western Reserve University, Cleveland, OH 44106, USA.

出版信息

BMC Bioinformatics. 2008 Jan 24;9:44. doi: 10.1186/1471-2105-9-44.

DOI:10.1186/1471-2105-9-44

PMID:18218094

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2375120/

Abstract

BACKGROUND

With the completion of the HapMap project, a variety of computational algorithms and tools have been proposed for haplotype inference, tag SNP selection and genome-wide association studies. Simulated data are commonly used in evaluating these new developed approaches. In addition to simulations based on population models, empirical data generated by perturbing real data, has also been used because it may inherit specific properties from real data. However, there is no tool that is publicly available to generate large scale simulated variation data by taking into account knowledge from the HapMap project.

RESULTS

A computer program (gs) was developed to quickly generate a large number of samples based on real data that are useful for a variety of purposes, including evaluating methods for haplotype inference, tag SNP selection and association studies. Two approaches have been implemented to generate dense SNP haplotype/genotype data that share similar local linkage disequilibrium (LD) patterns as those in human populations. The first approach takes haplotype pairs from samples as inputs, and the second approach takes patterns of haplotype block structures as inputs. Both quantitative and qualitative traits have been incorporated in the program. Phenotypes are generated based on a disease model, or based on the effect of a quantitative trait nucleotide, both of which can be specified by users. In addition to single-locus disease models, two-locus disease models have also been implemented that can incorporate any degree of epistasis. Users are allowed to specify all nine parameters in a 3 x 3 penetrance table. For several commonly used two-locus disease models, the program can automatically calculate penetrances based on the population prevalence and marginal effects of a disease that users can conveniently specify.

CONCLUSION

The program gs can effectively generate large scale genetic and phenotypic variation data that can be used for evaluating new developed approaches. It is freely available from the authors' web site at http://www.eecs.case.edu/~jxl175/gs.html.

摘要

背景

随着HapMap计划的完成，已经提出了各种用于单倍型推断、标签SNP选择和全基因组关联研究的计算算法和工具。模拟数据常用于评估这些新开发的方法。除了基于群体模型的模拟之外，通过扰动真实数据生成的经验数据也被使用，因为它可能继承真实数据的特定属性。然而，没有公开可用的工具能够考虑到HapMap计划中的知识来生成大规模模拟变异数据。

结果

开发了一个计算机程序（gs），基于真实数据快速生成大量样本，这些样本可用于多种目的，包括评估单倍型推断、标签SNP选择和关联研究的方法。已实施两种方法来生成与人类群体中具有相似局部连锁不平衡（LD）模式的密集SNP单倍型/基因型数据。第一种方法将样本中的单倍型对作为输入，第二种方法将单倍型块结构模式作为输入。该程序中纳入了数量性状和质量性状。表型基于疾病模型或基于数量性状核苷酸的效应生成，两者均可由用户指定。除了单基因座疾病模型外，还实施了双基因座疾病模型，该模型可纳入任何程度的上位性。允许用户在3×3外显率表中指定所有九个参数。对于几种常用的双基因座疾病模型，该程序可以根据用户可以方便指定的疾病群体患病率和边际效应自动计算外显率。

结论

程序gs可以有效地生成大规模遗传和表型变异数据，可用于评估新开发的方法。可从作者的网站http://www.eecs.case.edu/~jxl175/gs.html免费获取该程序。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3d53/2375120/535b5219e5e6/1471-2105-9-44-1.jpg

相似文献

Generating samples for association studies based on HapMap data.基于HapMap数据生成用于关联研究的样本。

BMC Bioinformatics. 2008 Jan 24;9:44. doi: 10.1186/1471-2105-9-44.

Haplotype-based quantitative trait mapping using a clustering algorithm.使用聚类算法的基于单倍型的数量性状定位

BMC Bioinformatics. 2006 May 18;7:258. doi: 10.1186/1471-2105-7-258.

iHAP--integrated haplotype analysis pipeline for characterizing the haplotype structure of genes.iHAP——用于描述基因单倍型结构的综合单倍型分析流程

BMC Bioinformatics. 2006 Dec 1;7:525. doi: 10.1186/1471-2105-7-525.

Haplotype block partitioning and tag SNP selection using genotype data and their applications to association studies.利用基因型数据进行单倍型块划分和标签单核苷酸多态性选择及其在关联研究中的应用。

Genome Res. 2004 May;14(5):908-16. doi: 10.1101/gr.1837404. Epub 2004 Apr 12.

HapBlock: haplotype block partitioning and tag SNP selection software using a set of dynamic programming algorithms.HapBlock：一种使用一组动态规划算法进行单倍型块划分和标签单核苷酸多态性选择的软件。

Bioinformatics. 2005 Jan 1;21(1):131-4. doi: 10.1093/bioinformatics/bth482. Epub 2004 Aug 27.

High density linkage disequilibrium mapping using models of haplotype block variation.使用单倍型块变异模型进行高密度连锁不平衡作图。

Bioinformatics. 2004 Aug 4;20 Suppl 1:i137-44. doi: 10.1093/bioinformatics/bth907.

Tag SNP selection for Finnish individuals based on the CEPH Utah HapMap database.基于CEPH犹他州HapMap数据库为芬兰个体选择标签单核苷酸多态性。

Genet Epidemiol. 2006 Feb;30(2):180-90. doi: 10.1002/gepi.20131.

Leveraging reads that span multiple single nucleotide polymorphisms for haplotype inference from sequencing data.利用跨越多个单核苷酸多态性的读取信息，从测序数据中推断单倍型。

Bioinformatics. 2013 Sep 15;29(18):2245-52. doi: 10.1093/bioinformatics/btt386. Epub 2013 Jul 3.

RAINBOW: Haplotype-based genome-wide association study using a novel SNP-set method.基于单倍型的全基因组关联研究，使用一种新的 SNP 集方法。

PLoS Comput Biol. 2020 Feb 14;16(2):e1007663. doi: 10.1371/journal.pcbi.1007663. eCollection 2020 Feb.

SimPed: a simulation program to generate haplotype and genotype data for pedigree structures.SimPed：一个用于为系谱结构生成单倍型和基因型数据的模拟程序。

Hum Hered. 2005;60(2):119-22. doi: 10.1159/000088914. Epub 2005 Oct 13.

引用本文的文献

STS-BN: An efficient Bayesian network method for detecting causal SNPs.STS-BN：一种用于检测因果单核苷酸多态性的高效贝叶斯网络方法。

Front Genet. 2022 Sep 15;13:942464. doi: 10.3389/fgene.2022.942464. eCollection 2022.

Gene-Based Testing of Interactions Using XGBoost in Genome-Wide Association Studies.在全基因组关联研究中使用XGBoost进行基于基因的相互作用测试。

Front Cell Dev Biol. 2021 Dec 16;9:801113. doi: 10.3389/fcell.2021.801113. eCollection 2021.

Toxo: a library for calculating penetrance tables of high-order epistasis models.Toxo：一个用于计算高阶上位性模型 penetrance 表的库。

BMC Bioinformatics. 2020 Apr 9;21(1):138. doi: 10.1186/s12859-020-3456-3.

Gene-Based Nonparametric Testing of Interactions Using Distance Correlation Coefficient in Case-Control Association Studies.病例对照关联研究中基于基因的交互作用非参数检验：使用距离相关系数法

Genes (Basel). 2018 Dec 5;9(12):608. doi: 10.3390/genes9120608.

Detecting gene-gene interactions for complex quantitative traits using generalized fuzzy classification.使用广义模糊分类检测复杂数量性状的基因-基因相互作用。

BMC Bioinformatics. 2018 Sep 18;19(1):329. doi: 10.1186/s12859-018-2361-5.

Random Projection for Fast and Efficient Multivariate Correlation Analysis of High-Dimensional Data: A New Approach.用于高维数据快速高效多变量相关性分析的随机投影：一种新方法。

Front Genet. 2016 Jun 7;7:102. doi: 10.3389/fgene.2016.00102. eCollection 2016.

A novel Markov Blanket-based repeated-fishing strategy for capturing phenotype-related biomarkers in big omics data.一种基于马尔可夫毯的新型重复捕捞策略，用于在大型组学数据中捕获与表型相关的生物标志物。

BMC Genet. 2016 Mar 9;17:51. doi: 10.1186/s12863-016-0358-5.

A powerful score-based test statistic for detecting gene-gene co-association.一种用于检测基因-基因共关联的基于分数的强大检验统计量。

BMC Genet. 2016 Jan 29;17:31. doi: 10.1186/s12863-016-0331-3.

An Efficient Stepwise Statistical Test to Identify Multiple Linked Human Genetic Variants Associated with Specific Phenotypic Traits.一种用于识别与特定表型特征相关的多个连锁人类遗传变异的高效逐步统计检验。

PLoS One. 2015 Sep 25;10(9):e0138700. doi: 10.1371/journal.pone.0138700. eCollection 2015.

A gene-based information gain method for detecting gene-gene interactions in case-control studies.一种用于在病例对照研究中检测基因-基因相互作用的基于基因的信息增益方法。

Eur J Hum Genet. 2015 Nov;23(11):1566-72. doi: 10.1038/ejhg.2015.16. Epub 2015 Mar 11.

本文引用的文献

Genome-wide genotyping in amyotrophic lateral sclerosis and neurologically normal controls: first stage analysis and public release of data.肌萎缩侧索硬化症及神经功能正常对照人群的全基因组基因分型：第一阶段分析及数据公开发布

Lancet Neurol. 2007 Apr;6(4):322-8. doi: 10.1016/S1474-4422(07)70037-6.

Two-stage two-locus models in genome-wide association.全基因组关联研究中的两阶段双基因座模型

PLoS Genet. 2006 Sep 22;2(9):e157. doi: 10.1371/journal.pgen.0020157.

Optimal two-stage strategy for detecting interacting genes in complex diseases.用于检测复杂疾病中相互作用基因的最优两阶段策略。

BMC Genet. 2006 Jun 15;7:39. doi: 10.1186/1471-2156-7-39.

A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase.一种用于大规模群体基因型数据的快速灵活统计模型：在推断缺失基因型和单倍型相位中的应用。

Am J Hum Genet. 2006 Apr;78(4):629-44. doi: 10.1086/502802. Epub 2006 Feb 17.

A haplotype map of the human genome.人类基因组单倍型图谱。

Nature. 2005 Oct 27;437(7063):1299-320. doi: 10.1038/nature04226.

Haplotype-based linkage disequilibrium mapping via direct data mining.通过直接数据挖掘进行基于单倍型的连锁不平衡图谱分析。

Bioinformatics. 2005 Dec 15;21(24):4384-93. doi: 10.1093/bioinformatics/bti732. Epub 2005 Oct 25.

Genome-wide strategies for detecting multiple loci that influence complex diseases.用于检测影响复杂疾病的多个基因座的全基因组策略。

Nat Genet. 2005 Apr;37(4):413-7. doi: 10.1038/ng1537. Epub 2005 Mar 27.

Haploview: analysis and visualization of LD and haplotype maps.Haploview：连锁不平衡（LD）和单倍型图谱的分析与可视化

Bioinformatics. 2005 Jan 15;21(2):263-5. doi: 10.1093/bioinformatics/bth457. Epub 2004 Aug 5.

Linkage disequilibrium mapping via cladistic analysis of single-nucleotide polymorphism haplotypes.通过单核苷酸多态性单倍型的分支分析进行连锁不平衡作图。

Am J Hum Genet. 2004 Jul;75(1):35-43. doi: 10.1086/422174. Epub 2004 May 13.

The structure of haplotype blocks in the human genome.人类基因组中单倍型块的结构。

Science. 2002 Jun 21;296(5576):2225-9. doi: 10.1126/science.1069424. Epub 2002 May 23.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于HapMap数据生成用于关联研究的样本。

Generating samples for association studies based on HapMap data.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献