Wang Junbai, Batmanov Kirill
Pathology Department, Oslo University Hospital-Norwegian Radium Hospital, Montebello 0310, Oslo, Norway
Pathology Department, Oslo University Hospital-Norwegian Radium Hospital, Montebello 0310, Oslo, Norway.
Nucleic Acids Res. 2015 Dec 2;43(21):e147. doi: 10.1093/nar/gkv733. Epub 2015 Jul 21.
Sequence variations in regulatory DNA regions are known to cause functionally important consequences for gene expression. DNA sequence variations may have an essential role in determining phenotypes and may be linked to disease; however, their identification through analysis of massive genome-wide sequencing data is a great challenge. In this work, a new computational pipeline, a Bayesian method for protein-DNA interaction with binding affinity ranking (BayesPI-BAR), is proposed for quantifying the effect of sequence variations on protein binding. BayesPI-BAR uses biophysical modeling of protein-DNA interactions to predict single nucleotide polymorphisms (SNPs) that cause significant changes in the binding affinity of a regulatory region for transcription factors (TFs). The method includes two new parameters (TF chemical potentials or protein concentrations and direct TF binding targets) that are neglected by previous methods. The new method is verified on 67 known human regulatory SNPs, of which 47 (70%) have predicted true TFs ranked in the top 10. Importantly, the performance of BayesPI-BAR, which uses principal component analysis to integrate multiple predictions from various TF chemical potentials, is found to be better than that of existing programs, such as sTRAP and is-rSNP, when evaluated on the same SNPs. BayesPI-BAR is a publicly available tool and is able to carry out parallelized computation, which helps to investigate a large number of TFs or SNPs and to detect disease-associated regulatory sequence variations in the sea of genome-wide noncoding regions.
已知调控DNA区域中的序列变异会对基因表达产生功能上的重要影响。DNA序列变异可能在决定表型方面发挥关键作用,并且可能与疾病相关;然而,通过分析海量全基因组测序数据来识别这些变异是一项巨大的挑战。在这项工作中,我们提出了一种新的计算流程,即一种用于蛋白质-DNA相互作用并具有结合亲和力排名的贝叶斯方法(BayesPI-BAR),用于量化序列变异对蛋白质结合的影响。BayesPI-BAR利用蛋白质-DNA相互作用的生物物理模型来预测导致调控区域对转录因子(TF)结合亲和力发生显著变化的单核苷酸多态性(SNP)。该方法包含两个先前方法未考虑的新参数(TF化学势或蛋白质浓度以及直接的TF结合靶点)。新方法在67个已知的人类调控SNP上得到了验证,其中47个(70%)预测出的真正TF在排名前10位。重要的是,当在相同的SNP上进行评估时,发现使用主成分分析来整合来自各种TF化学势的多个预测结果的BayesPI-BAR的性能优于现有程序,如sTRAP和is-rSNP。BayesPI-BAR是一个公开可用的工具,能够进行并行计算,这有助于在全基因组非编码区域的海量数据中研究大量的TF或SNP,并检测与疾病相关的调控序列变异。