Department of Medicine, The University of Chicago, Chicago, IL, USA.
Bioinformatics. 2010 Jan 15;26(2):259-62. doi: 10.1093/bioinformatics/btp644. Epub 2009 Nov 17.
Genome-wide association studies (GWAS) generate relationships between hundreds of thousands of single nucleotide polymorphisms (SNPs) and complex phenotypes. The contribution of the traditionally overlooked copy number variations (CNVs) to complex traits is also being actively studied. To facilitate the interpretation of the data and the designing of follow-up experimental validations, we have developed a database that enables the sensible prioritization of these variants by combining several approaches, involving not only publicly available physical and functional annotations but also multilocus linkage disequilibrium (LD) annotations as well as annotations of expression quantitative trait loci (eQTLs).
For each SNP, the SCAN database provides: (i) summary information from eQTL mapping of HapMap SNPs to gene expression (evaluated by the Affymetrix exon array) in the full set of HapMap CEU (Caucasians from UT, USA) and YRI (Yoruba people from Ibadan, Nigeria) samples; (ii) LD information, in the case of a HapMap SNP, including what genes have variation in strong LD (pairwise or multilocus LD) with the variant and how well the SNP is covered by different high-throughput platforms; (iii) summary information available from public databases (e.g. physical and functional annotations); and (iv) summary information from other GWAS. For each gene, SCAN provides annotations on: (i) eQTLs for the gene (both local and distant SNPs) and (ii) the coverage of all variants in the HapMap at that gene on each high-throughput platform. For each genomic region, SCAN provides annotations on: (i) physical and functional annotations of all SNPs, genes and known CNVs within the region and (ii) all genes regulated by the eQTLs within the region.
Supplementary data are available at Bioinformatics online.
全基因组关联研究(GWAS)生成了数十万单个核苷酸多态性(SNP)与复杂表型之间的关系。传统上被忽视的拷贝数变异(CNV)对复杂性状的贡献也在积极研究中。为了便于解释数据和设计后续的实验验证,我们开发了一个数据库,通过结合几种方法,包括不仅公开的物理和功能注释,还包括多基因连锁不平衡(LD)注释以及表达数量性状基因座(eQTL)注释,来明智地优先考虑这些变体。
对于每个 SNP,SCAN 数据库提供:(i)来自 HapMap SNP 到全组 HapMap CEU(来自美国犹他州的白种人)和 YRI(来自尼日利亚伊巴丹的约鲁巴人)样本中基因表达的 HapMap SNP 表达定量性状基因座(eQTL)映射的摘要信息(通过 Affymetrix 外显子芯片评估);(ii)在 HapMap SNP 的情况下,包括与变体具有强 LD(成对或多基因 LD)的基因以及不同高通量平台覆盖该 SNP 的情况的 LD 信息;(iii)来自公共数据库(例如物理和功能注释)的摘要信息;以及(iv)其他 GWAS 的摘要信息。对于每个基因,SCAN 提供了以下注释:(i)该基因的 eQTL(本地和远程 SNP)和(ii)该基因在每个高通量平台上的 HapMap 中所有变体的覆盖情况。对于每个基因组区域,SCAN 提供了以下注释:(i)该区域内所有 SNP、基因和已知 CNV 的物理和功能注释以及(ii)该区域内所有由 eQTL 调节的基因。
补充数据可在生物信息学在线获得。