Suppr超能文献

简单即美:一种改进PSI-BLAST搜索中真阳性和假阳性划分的直接方法。

Simple is beautiful: a straightforward approach to improve the delineation of true and false positives in PSI-BLAST searches.

作者信息

Lee Marianne M, Chan Michael K, Bundschuh Ralf

机构信息

The Ohio State Biophysics Program, Ohio State University, 484 W 12th Av., Columbus OH 43210-1117, USA.

出版信息

Bioinformatics. 2008 Jun 1;24(11):1339-43. doi: 10.1093/bioinformatics/btn130. Epub 2008 Apr 10.

Abstract

MOTIVATION

The deluge of biological information from different genomic initiatives and the rapid advancement in biotechnologies have made bioinformatics tools an integral part of modern biology. Among the widely used sequence alignment tools, BLAST and PSI-BLAST are arguably the most popular. PSI-BLAST, which uses an iterative profile position specific score matrix (PSSM)-based search strategy, is more sensitive than BLAST in detecting weak homologies, thus making it suitable for remote homolog detection. Many refinements have been made to improve PSI-BLAST, and its computational efficiency and high specificity have been much touted. Nevertheless, corruption of its profile via the incorporation of false positive sequences remains a major challenge.

RESULTS

We have developed a simple and elegant approach to resolve the problem of model corruption in PSI-BLAST searches. We hypothesized that combining results from the first (least-corrupted) profile with results from later (most sensitive) iterations of PSI-BLAST provides a better discriminator for true and false hits. Accordingly, we have derived a formula that utilizes the E-values from these two PSI-BLAST iterations to obtain a figure of merit for rank-ordering the hits. Our verification results based on a 'gold-standard' test set indicate that this figure of merit does indeed delineate true positives from false positives better than PSI-BLAST E-values. Perhaps what is most notable about this strategy is that it is simple and straightforward to implement.

摘要

动机

来自不同基因组计划的海量生物信息以及生物技术的快速发展,使生物信息学工具成为现代生物学不可或缺的一部分。在广泛使用的序列比对工具中,BLAST和PSI-BLAST可以说是最受欢迎的。PSI-BLAST采用基于迭代剖面特定位置得分矩阵(PSSM)的搜索策略,在检测弱同源性方面比BLAST更敏感,因此适用于远源同源物检测。人们已经进行了许多改进以提升PSI-BLAST,其计算效率和高特异性也备受赞誉。然而,通过纳入假阳性序列导致其剖面受损仍然是一个重大挑战。

结果

我们开发了一种简单而巧妙的方法来解决PSI-BLAST搜索中模型受损的问题。我们假设将第一个(受损最少)剖面的结果与PSI-BLAST后续(最敏感)迭代的结果相结合,能为区分真阳性和假阳性提供更好的判别标准。因此,我们推导出了一个公式,利用这两次PSI-BLAST迭代的E值来获得一个品质因数,用于对命中结果进行排序。我们基于一个“黄金标准”测试集的验证结果表明,这个品质因数确实比PSI-BLAST的E值能更好地区分真阳性和假阳性。也许这个策略最值得注意的是它实现起来简单直接。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验