Pudimat Rainer, Schukat-Talamazzini Ernst-Günter, Backofen Rolf
Institut für Informatik, Friedrich-Schiller-Universität Ernst-Abbe-Platz 3, D-07743 Jena, Germany.
Bioinformatics. 2005 Jul 15;21(14):3082-8. doi: 10.1093/bioinformatics/bti477. Epub 2005 May 19.
The identification of transcription factor binding sites in promoter sequences is an important problem, since it reveals information about the transcriptional regulation of genes. For analysing transcriptional regulation, computational approaches for predicting putative binding sites are applied. Commonly used stochastic models for binding sites are position-specific score matrices, which show weak predictive power.
We have developed a probabilistic modelling approach, which allows to consider diverse characteristic binding site properties to obtain more accurate representations of binding sites. These properties are modelled as random variables in Bayesian networks, which are capable of dealing with dependencies among binding site properties. Cross-validation on several datasets shows improvements in the false positive error rate and the significance (P-value) of true binding sites.
识别启动子序列中的转录因子结合位点是一个重要问题,因为它揭示了有关基因转录调控的信息。为了分析转录调控,人们应用了预测假定结合位点的计算方法。常用的结合位点随机模型是位置特异性得分矩阵,其预测能力较弱。
我们开发了一种概率建模方法,该方法能够考虑不同的特征结合位点属性,以获得更准确的结合位点表示。这些属性在贝叶斯网络中被建模为随机变量,贝叶斯网络能够处理结合位点属性之间的依赖性。在多个数据集上进行的交叉验证显示,误报率和真实结合位点的显著性(P值)均有改善。