一种用于识别全基因组关联研究染色体模式的扫描统计量的快速实现方法。

A Fast Implementation of a Scan Statistic for Identifying Chromosomal Patterns of Genome Wide Association Studies.

作者信息

Sun Yan V, Jacobsen Douglas M, Turner Stephen T, Boerwinkle Eric, Kardia Sharon L R

机构信息

Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, Michigan.

出版信息

Comput Stat Data Anal. 2009 Mar 15;53(5):1794-1801. doi: 10.1016/j.csda.2008.04.013.

DOI:10.1016/j.csda.2008.04.013

PMID:20161066

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2747781/

Abstract

In order to take into account the complex genomic distribution of SNP variations when identifying chromosomal regions with significant SNP effects, a single nucleotide polymorphism (SNP) association scan statistic was developed. To address the computational needs of genome wide association (GWA) studies, a fast Java application, which combines single-locus SNP tests and a scan statistic for identifying chromosomal regions with significant clusters of significant SNP effects, was developed and implemented. To illustrate this application, SNP associations were analyzed in a pharmacogenomic study of the blood pressure lowering effect of thiazide-diuretics (N=195) using the Affymetrix Human Mapping 100K Set. 55,335 tagSNPs (pair-wise linkage disequilibrium R(2)<0.5) were selected to reduce the frequency correlation between SNPs. A typical workstation can complete the whole genome scan including 10,000 permutation tests within 3 hours. The most significant regions locate on chromosome 3, 6, 13 and 16, two of which contain candidate genes that may be involved in the underlying drug response mechanism. The computational performance of ChromoScan-GWA and its scalability were tested with up to 1,000,000 SNPs and up to 4,000 subjects. Using 10,000 permutations, the computation time grew linearly in these datasets. This scan statistic application provides a robust statistical and computational foundation for identifying genomic regions associated with disease and provides a method to compare GWA results even across different platforms.

摘要

为了在识别具有显著单核苷酸多态性（SNP）效应的染色体区域时考虑SNP变异的复杂基因组分布，开发了一种单核苷酸多态性（SNP）关联扫描统计量。为满足全基因组关联（GWA）研究的计算需求，开发并实现了一个快速Java应用程序，该程序结合了单基因座SNP测试和一种扫描统计量，用于识别具有显著SNP效应簇的染色体区域。为说明此应用程序，在一项使用Affymetrix Human Mapping 100K Set进行的噻嗪类利尿剂降压效果的药物基因组学研究（N = 195）中分析了SNP关联。选择了55,335个标签SNP（成对连锁不平衡R(2)<0.5）以降低SNP之间的频率相关性。一台典型的工作站可以在3小时内完成包括10,000次置换检验的全基因组扫描。最显著的区域位于3号、6号、13号和16号染色体上，其中两个区域包含可能参与潜在药物反应机制的候选基因。使用多达1,000,000个SNP和多达4,000名受试者对ChromoScan-GWA的计算性能及其可扩展性进行了测试。使用10,000次置换，在这些数据集中计算时间呈线性增长。这种扫描统计量应用程序为识别与疾病相关的基因组区域提供了强大的统计和计算基础，并提供了一种即使在不同平台之间比较GWA结果的方法。

相似文献

A Fast Implementation of a Scan Statistic for Identifying Chromosomal Patterns of Genome Wide Association Studies.一种用于识别全基因组关联研究染色体模式的扫描统计量的快速实现方法。

Comput Stat Data Anal. 2009 Mar 15;53(5):1794-1801. doi: 10.1016/j.csda.2008.04.013.

A scan statistic for identifying chromosomal patterns of SNP association.一种用于识别单核苷酸多态性（SNP）关联染色体模式的扫描统计量。

Genet Epidemiol. 2006 Nov;30(7):627-35. doi: 10.1002/gepi.20173.

ChromoScan: a scan statistic application for identifying chromosomal regions in genomic studies.ChromoScan：一种用于在基因组研究中识别染色体区域的扫描统计应用程序。

Bioinformatics. 2006 Dec 1;22(23):2945-7. doi: 10.1093/bioinformatics/btl503. Epub 2006 Oct 10.

ParallABEL: an R library for generalized parallelization of genome-wide association studies.ParallABEL：一个用于全基因组关联研究的广义并行化的 R 库。

BMC Bioinformatics. 2010 Apr 29;11:217. doi: 10.1186/1471-2105-11-217.

A 100K genome-wide association scan for diabetes and related traits in the Framingham Heart Study: replication and integration with other genome-wide datasets.弗雷明汉心脏研究中针对糖尿病及相关性状的100K全基因组关联扫描：与其他全基因组数据集的复制及整合

Diabetes. 2007 Dec;56(12):3063-74. doi: 10.2337/db07-0451. Epub 2007 Sep 11.

"Replicated" genome wide association for dependence on illegal substances: genomic regions identified by overlapping clusters of nominally positive SNPs.“依赖非法物质的复制全基因组关联研究：通过名义上阳性 SNPs 的重叠簇确定的基因组区域。”

Am J Med Genet B Neuropsychiatr Genet. 2011 Mar;156(2):125-38. doi: 10.1002/ajmg.b.31143. Epub 2010 Dec 16.

Uncovering networks from genome-wide association studies via circular genomic permutation.通过环状基因组置换从全基因组关联研究中揭示网络

G3 (Bethesda). 2012 Sep;2(9):1067-75. doi: 10.1534/g3.112.002618. Epub 2012 Sep 1.

RS-SNP: a random-set method for genome-wide association studies.RS-SNP：一种用于全基因组关联研究的随机集方法。

BMC Genomics. 2011 Mar 30;12:166. doi: 10.1186/1471-2164-12-166.

Genome-wide "pleiotropy scan" identifies HNF1A region as a novel pancreatic cancer susceptibility locus.全基因组“多效性扫描”鉴定 HNF1A 区域为新的胰腺癌易感性位点。

Cancer Res. 2011 Jul 1;71(13):4352-8. doi: 10.1158/0008-5472.CAN-11-0124. Epub 2011 Apr 15.

Fine mapping by composite genome-wide association analysis.通过复合全基因组关联分析进行精细定位。

Genet Res (Camb). 2017 Jun 6;99:e4. doi: 10.1017/S0016672317000027.

引用本文的文献

Maximal Segmental Score Method for Localizing Recessive Disease Variants Based on Sequence Data.基于序列数据定位隐性疾病变异的最大节段评分法。

Front Genet. 2020 Jun 12;11:555. doi: 10.3389/fgene.2020.00555. eCollection 2020.

Exome resequencing and GWAS for growth, ecophysiology, and chemical and metabolomic composition of wood of Populus trichocarpa.对杨树木材生长、生理生态、化学成分和代谢组学的外显子组重测序和 GWAS 分析。

BMC Genomics. 2019 Nov 20;20(1):875. doi: 10.1186/s12864-019-6160-9.

A latent variable partial least squares path modeling approach to regional association and polygenic effect with applications to a human obesity study.一种潜在变量偏最小二乘路径建模方法，用于区域关联和多基因效应，并应用于人类肥胖研究。

PLoS One. 2012;7(2):e31927. doi: 10.1371/journal.pone.0031927. Epub 2012 Feb 27.

Region-based analysis in genome-wide association study of Framingham Heart Study blood lipid phenotypes.弗雷明汉心脏研究血脂表型全基因组关联研究中的基于区域的分析。

BMC Proc. 2009 Dec 15;3 Suppl 7(Suppl 7):S127. doi: 10.1186/1753-6561-3-s7-s127.

本文引用的文献

Am J Hum Genet. 2007 Dec;81(6):1158-68. doi: 10.1086/522036.

Family-based association tests for genomewide association scans.用于全基因组关联研究的基于家系的关联检验

Am J Hum Genet. 2007 Nov;81(5):913-26. doi: 10.1086/521580. Epub 2007 Sep 18.

A new multipoint method for genome-wide association studies by imputation of genotypes.一种通过基因型插补进行全基因组关联研究的新的多点方法。

Nat Genet. 2007 Jul;39(7):906-13. doi: 10.1038/ng2088. Epub 2007 Jun 17.

Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls.对14000例七种常见疾病患者及3000例共享对照进行全基因组关联研究。

Nature. 2007 Jun 7;447(7145):661-78. doi: 10.1038/nature05911.

A common allele on chromosome 9 associated with coronary heart disease.位于9号染色体上的一个与冠心病相关的常见等位基因。

Science. 2007 Jun 8;316(5830):1488-91. doi: 10.1126/science.1142447. Epub 2007 May 3.

A common variant on chromosome 9p21 affects the risk of myocardial infarction.9号染色体短臂21区的一个常见变异影响心肌梗死风险。

Science. 2007 Jun 8;316(5830):1491-3. doi: 10.1126/science.1142842. Epub 2007 May 3.

A genome-wide association study of type 2 diabetes in Finns detects multiple susceptibility variants.一项针对芬兰人2型糖尿病的全基因组关联研究发现了多个易感变异体。

Science. 2007 Jun 1;316(5829):1341-5. doi: 10.1126/science.1142382. Epub 2007 Apr 26.

Genome-wide association analysis identifies loci for type 2 diabetes and triglyceride levels.全基因组关联分析确定2型糖尿病和甘油三酯水平的基因座。

Science. 2007 Jun 1;316(5829):1331-6. doi: 10.1126/science.1142358. Epub 2007 Apr 26.

A variant in CDKAL1 influences insulin response and risk of type 2 diabetes.CDKAL1基因的一个变体影响胰岛素反应及2型糖尿病风险。

Nat Genet. 2007 Jun;39(6):770-5. doi: 10.1038/ng2043. Epub 2007 Apr 26.

Novel Crohn disease locus identified by genome-wide association maps to a gene desert on 5p13.1 and modulates expression of PTGER4.通过全基因组关联鉴定出的新型克罗恩病基因座定位于5p13.1上的基因荒漠区域，并调控PTGER4的表达。

PLoS Genet. 2007 Apr 20;3(4):e58. doi: 10.1371/journal.pgen.0030058. Epub 2007 Mar 5.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。