Lee Marianne M, Chan Michael K, Bundschuh Ralf
The Ohio State Biophysics Program, Ohio State University, 484 W 12th Av., Columbus OH 43210-1117, USA.
Bioinformatics. 2008 Jun 1;24(11):1339-43. doi: 10.1093/bioinformatics/btn130. Epub 2008 Apr 10.
The deluge of biological information from different genomic initiatives and the rapid advancement in biotechnologies have made bioinformatics tools an integral part of modern biology. Among the widely used sequence alignment tools, BLAST and PSI-BLAST are arguably the most popular. PSI-BLAST, which uses an iterative profile position specific score matrix (PSSM)-based search strategy, is more sensitive than BLAST in detecting weak homologies, thus making it suitable for remote homolog detection. Many refinements have been made to improve PSI-BLAST, and its computational efficiency and high specificity have been much touted. Nevertheless, corruption of its profile via the incorporation of false positive sequences remains a major challenge.
We have developed a simple and elegant approach to resolve the problem of model corruption in PSI-BLAST searches. We hypothesized that combining results from the first (least-corrupted) profile with results from later (most sensitive) iterations of PSI-BLAST provides a better discriminator for true and false hits. Accordingly, we have derived a formula that utilizes the E-values from these two PSI-BLAST iterations to obtain a figure of merit for rank-ordering the hits. Our verification results based on a 'gold-standard' test set indicate that this figure of merit does indeed delineate true positives from false positives better than PSI-BLAST E-values. Perhaps what is most notable about this strategy is that it is simple and straightforward to implement.
来自不同基因组计划的海量生物信息以及生物技术的快速发展,使生物信息学工具成为现代生物学不可或缺的一部分。在广泛使用的序列比对工具中,BLAST和PSI-BLAST可以说是最受欢迎的。PSI-BLAST采用基于迭代剖面特定位置得分矩阵(PSSM)的搜索策略,在检测弱同源性方面比BLAST更敏感,因此适用于远源同源物检测。人们已经进行了许多改进以提升PSI-BLAST,其计算效率和高特异性也备受赞誉。然而,通过纳入假阳性序列导致其剖面受损仍然是一个重大挑战。
我们开发了一种简单而巧妙的方法来解决PSI-BLAST搜索中模型受损的问题。我们假设将第一个(受损最少)剖面的结果与PSI-BLAST后续(最敏感)迭代的结果相结合,能为区分真阳性和假阳性提供更好的判别标准。因此,我们推导出了一个公式,利用这两次PSI-BLAST迭代的E值来获得一个品质因数,用于对命中结果进行排序。我们基于一个“黄金标准”测试集的验证结果表明,这个品质因数确实比PSI-BLAST的E值能更好地区分真阳性和假阳性。也许这个策略最值得注意的是它实现起来简单直接。