Department of Computer Science and Software Engineering, University of Melbourne, Victoria, Australia.
Bioinformatics. 2010 Sep 15;26(18):i524-30. doi: 10.1093/bioinformatics/btq378.
Determining the functional impact of non-coding disease-associated single nucleotide polymorphisms (SNPs) identified by genome-wide association studies (GWAS) is challenging. Many of these SNPs are likely to be regulatory SNPs (rSNPs): variations which affect the ability of a transcription factor (TF) to bind to DNA. However, experimental procedures for identifying rSNPs are expensive and labour intensive. Therefore, in silico methods are required for rSNP prediction. By scoring two alleles with a TF position weight matrix (PWM), it can be determined which SNPs are likely rSNPs. However, predictions in this manner are noisy and no method exists that determines the statistical significance of a nucleotide variation on a PWM score.
We have designed an algorithm for in silico rSNP detection called is-rSNP. We employ novel convolution methods to determine the complete distributions of PWM scores and ratios between allele scores, facilitating assignment of statistical significance to rSNP effects. We have tested our method on 41 experimentally verified rSNPs, correctly predicting the disrupted TF in 28 cases. We also analysed 146 disease-associated SNPs with no known functional impact in an attempt to identify candidate rSNPs. Of the 11 significantly predicted disrupted TFs, 9 had previous evidence of being associated with the disease in the literature. These results demonstrate that is-rSNP is suitable for high-throughput screening of SNPs for potential regulatory function. This is a useful and important tool in the interpretation of GWAS.
is-rSNP software is available for use at: www.genomics.csse.unimelb.edu.au/is-rSNP.
确定全基因组关联研究 (GWAS) 中发现的非编码疾病相关单核苷酸多态性 (SNP) 的功能影响具有挑战性。这些 SNP 中有许多可能是调节性 SNP(rSNP):即影响转录因子 (TF) 与 DNA 结合能力的变异。然而,识别 rSNP 的实验程序既昂贵又费力。因此,需要进行 rSNP 预测的计算方法。通过使用 TF 位置权重矩阵 (PWM) 对两个等位基因进行评分,可以确定哪些 SNP 可能是 rSNP。然而,这种方式的预测存在噪声,并且不存在确定 PWM 得分上核苷酸变异的统计显著性的方法。
我们设计了一种称为 is-rSNP 的计算 rSNP 检测算法。我们采用新颖的卷积方法来确定 PWM 得分和等位基因得分之间的完整分布比,从而为 rSNP 效应分配统计显著性。我们在 41 个经过实验验证的 rSNP 上测试了我们的方法,正确预测了 28 个病例中受干扰的 TF。我们还分析了 146 个与疾病相关但无已知功能影响的 SNP,试图鉴定候选 rSNP。在 11 个显著预测的受干扰 TF 中,有 9 个在文献中有与疾病相关的先前证据。这些结果表明,is-rSNP 适用于高通量筛选潜在调节功能的 SNP。这是 GWAS 解释中的一个有用且重要的工具。
is-rSNP 软件可在以下网址使用:www.genomics.csse.unimelb.edu.au/is-rSNP。