Department of Genetics, Department of Community and Family Medicine, Dartmouth Medical School, Lebanon, NH 03756, USA.
Bioinformatics. 2010 Feb 15;26(4):445-55. doi: 10.1093/bioinformatics/btp713. Epub 2010 Jan 6.
The sequencing of the human genome has made it possible to identify an informative set of >1 million single nucleotide polymorphisms (SNPs) across the genome that can be used to carry out genome-wide association studies (GWASs). The availability of massive amounts of GWAS data has necessitated the development of new biostatistical methods for quality control, imputation and analysis issues including multiple testing. This work has been successful and has enabled the discovery of new associations that have been replicated in multiple studies. However, it is now recognized that most SNPs discovered via GWAS have small effects on disease susceptibility and thus may not be suitable for improving health care through genetic testing. One likely explanation for the mixed results of GWAS is that the current biostatistical analysis paradigm is by design agnostic or unbiased in that it ignores all prior knowledge about disease pathobiology. Further, the linear modeling framework that is employed in GWAS often considers only one SNP at a time thus ignoring their genomic and environmental context. There is now a shift away from the biostatistical approach toward a more holistic approach that recognizes the complexity of the genotype-phenotype relationship that is characterized by significant heterogeneity and gene-gene and gene-environment interaction. We argue here that bioinformatics has an important role to play in addressing the complexity of the underlying genetic basis of common human diseases. The goal of this review is to identify and discuss those GWAS challenges that will require computational methods.
人类基因组测序使得在整个基因组中鉴定出 >100 万个有信息量的单核苷酸多态性(SNP)成为可能,这些 SNP 可用于进行全基因组关联研究(GWAS)。大量 GWAS 数据的出现,需要开发新的生物统计学方法来解决质量控制、缺失值推断和分析问题,包括多重检验。这项工作取得了成功,并发现了新的关联,这些关联在多个研究中得到了复制。然而,现在人们认识到,通过 GWAS 发现的大多数 SNP 对疾病易感性的影响很小,因此可能不适合通过基因测试来改善医疗保健。GWAS 结果喜忧参半的一个可能解释是,当前的生物统计学分析范式在设计上是不可知或无偏的,因为它忽略了所有关于疾病病理生物学的先验知识。此外,GWAS 中使用的线性建模框架通常一次只考虑一个 SNP,从而忽略了它们的基因组和环境背景。现在,人们正在从生物统计学方法向更全面的方法转变,这种方法认识到基因型-表型关系的复杂性,其特征是显著的异质性以及基因-基因和基因-环境相互作用。我们在这里认为,生物信息学在解决常见人类疾病潜在遗传基础的复杂性方面发挥着重要作用。本综述的目的是确定并讨论那些需要计算方法的 GWAS 挑战。