Department of Computer Engineering, Kyungpook National University, Daegu 702-701, South Korea.
BMC Bioinformatics. 2010 Jan 18;11 Suppl 1(Suppl 1):S37. doi: 10.1186/1471-2105-11-S1-S37.
Introduction of spaced speeds opened a way of sensitivity improvement in homology search without loss of search speed. Since then, the efforts of finding optimal seed which maximizes the sensitivity have been continued today. The sensitivity of a seed is generally computed by its hit probability. However, the limitation of hit probability is that it computes the sensitivity only at a specific similarity level while homologous regions usually distributed in various similarity levels. As a result, the optimal seed found by hit probability is not actually optimal for various similarity levels. Therefore, a new measure of seed sensitivity is required to recommend seeds that are robust to various similarity levels.
We propose a new probability model of sensitivity hit integration which covers a range of similarity levels of homologous regions. A novel algorithm of computing hit integration is proposed which is based on integration of hit probabilities at a range of similarity levels. We also prove that hit integration is computable by expressing the integral part of hit integration as a recursive formula which can be easily solved by dynamic programming. The experimental results for biological data show that hit integration reveals the seeds more optimal than those by PatternHunter.
The presented model is a more general model to estimate sensitivity than hit probability by relaxing similarity level. We propose a novel algorithm which directly computes the sensitivity at a range of similarity levels.
引入间隔速度为同源搜索的灵敏度提高开辟了一条途径,而不会损失搜索速度。从那时起,寻找最大化灵敏度的最佳种子的努力一直持续到今天。种子的灵敏度通常通过其命中率来计算。然而,命中率的局限性在于,它仅在特定的相似性水平上计算灵敏度,而同源区域通常分布在各种相似性水平上。因此,通过命中率找到的最佳种子实际上并不是针对各种相似性水平的最佳选择。因此,需要一种新的种子灵敏度度量标准来推荐对各种相似性水平具有鲁棒性的种子。
我们提出了一种新的灵敏度命中整合概率模型,该模型涵盖了同源区域的一系列相似性水平。提出了一种新的计算命中整合的算法,该算法基于在一系列相似性水平上计算命中概率的整合。我们还证明,命中整合是可计算的,通过将命中整合的积分部分表示为一个递归公式,该公式可以通过动态规划轻松求解。生物数据的实验结果表明,命中整合比 PatternHunter 揭示的种子更优。
与放松相似性水平相比,所提出的模型是一种更通用的估计灵敏度的模型,而不是命中率。我们提出了一种新的算法,可以直接在一系列相似性水平上计算灵敏度。