Friberg Markus, von Rohr Peter, Gonnet Gaston
Institute of Computational Science, ETH, 8092 Zurich, Switzerland.
BMC Bioinformatics. 2005 Apr 4;6:84. doi: 10.1186/1471-2105-6-84.
Transcription factor binding site (TFBS) prediction is a difficult problem, which requires a good scoring function to discriminate between real binding sites and background noise. Many scoring functions have been proposed in the literature, but it is difficult to assess their relative performance, because they are implemented in different software tools using different search methods and different TFBS representations.
Here we compare how several scoring functions perform on both real and semi-simulated data sets in a common test environment. We have also developed two new scoring functions and included them in the comparison. The data sets are from the yeast (S. cerevisiae) genome. Our new scoring function LLBG (least likely under the background model) performs best in this study. It achieves the best average rank for the correct motifs. Scoring functions based on positional bias performed quite poorly in this study.
LLBG may provide an interesting alternative to current scoring functions for TFBS prediction.
转录因子结合位点(TFBS)预测是一个难题,需要良好的评分函数来区分真实结合位点和背景噪声。文献中已提出许多评分函数,但难以评估它们的相对性能,因为它们是在不同的软件工具中使用不同的搜索方法和不同的TFBS表示来实现的。
在这里,我们比较了几种评分函数在常见测试环境中对真实数据集和半模拟数据集的表现。我们还开发了两种新的评分函数并将它们纳入比较。数据集来自酵母(酿酒酵母)基因组。我们的新评分函数LLBG(背景模型下最不可能)在本研究中表现最佳。它在正确基序方面获得了最佳平均排名。基于位置偏差的评分函数在本研究中表现相当差。
LLBG可能为当前用于TFBS预测的评分函数提供一个有趣的替代方案。