Chen Rong, Morgan Alex A, Dudley Joel, Deshpande Tarangini, Li Li, Kodama Keiichi, Chiang Annie P, Butte Atul J
Stanford Center for Biomedical Informatics Research, 251 Cmpus Drive, Stanford, CA 94305, USA.
Genome Biol. 2008;9(12):R170. doi: 10.1186/gb-2008-9-12-r170. Epub 2008 Dec 5.
BACKGROUND: Candidate single nucleotide polymorphisms (SNPs) from genome-wide association studies (GWASs) were often selected for validation based on their functional annotation, which was inadequate and biased. We propose to use the more than 200,000 microarray studies in the Gene Expression Omnibus to systematically prioritize candidate SNPs from GWASs. RESULTS: We analyzed all human microarray studies from the Gene Expression Omnibus, and calculated the observed frequency of differential expression, which we called differential expression ratio, for every human gene. Analysis conducted in a comprehensive list of curated disease genes revealed a positive association between differential expression ratio values and the likelihood of harboring disease-associated variants. By considering highly differentially expressed genes, we were able to rediscover disease genes with 79% specificity and 37% sensitivity. We successfully distinguished true disease genes from false positives in multiple GWASs for multiple diseases. We then derived a list of functionally interpolating SNPs (fitSNPs) to analyze the top seven loci of Wellcome Trust Case Control Consortium type 1 diabetes mellitus GWASs, rediscovered all type 1 diabetes mellitus genes, and predicted a novel gene (KIAA1109) for an unexplained locus 4q27. We suggest that fitSNPs would work equally well for both Mendelian and complex diseases (being more effective for cancer) and proposed candidate genes to sequence for their association with 597 syndromes with unknown molecular basis. CONCLUSIONS: Our study demonstrates that highly differentially expressed genes are more likely to harbor disease-associated DNA variants. FitSNPs can serve as an effective tool to systematically prioritize candidate SNPs from GWASs.
背景:全基因组关联研究(GWAS)中的候选单核苷酸多态性(SNP)通常基于其功能注释进行选择以进行验证,而这种注释是不充分且有偏差的。我们建议使用基因表达综合数据库(Gene Expression Omnibus)中的20多万项微阵列研究来系统地对GWAS中的候选SNP进行优先级排序。 结果:我们分析了基因表达综合数据库中的所有人类微阵列研究,并计算了每个人类基因的差异表达观察频率,我们将其称为差异表达率。在一份精心整理的疾病基因综合列表中进行的分析显示,差异表达率值与携带疾病相关变异的可能性之间存在正相关。通过考虑高度差异表达的基因,我们能够以79%的特异性和37%的敏感性重新发现疾病基因。我们在多种疾病的多个GWAS中成功区分了真正的疾病基因和假阳性。然后,我们得出了一份功能内插SNP(fitSNP)列表,以分析威康信托病例对照协会1型糖尿病GWAS的前七个位点,重新发现了所有1型糖尿病基因,并预测了一个位于4q27未知位点的新基因(KIAA1109)。我们认为fitSNP对孟德尔疾病和复杂疾病同样有效(对癌症更有效),并提出了与597种分子基础未知的综合征相关的候选基因进行测序。 结论:我们的研究表明,高度差异表达的基因更有可能携带疾病相关的DNA变异。FitSNP可作为一种有效的工具,用于系统地对GWAS中的候选SNP进行优先级排序。
Brief Bioinform. 2019-1-18
Am J Med Genet B Neuropsychiatr Genet. 2023
BMC Bioinformatics. 2021-2-3
BMC Bioinformatics. 2019-12-27
Front Med (Lausanne). 2018-8-14
Curr Biol. 2008-6-24
Nucleic Acids Res. 2008-7-1