Gribskov M, Robinson N L
San Diego Supercomputer Center, P.O. Box 85608, San Diego, CA 92186-9784, USA.
Comput Chem. 1996 Mar;20(1):25-33. doi: 10.1016/s0097-8485(96)80004-0.
In this paper, we borrow the idea of the receiver operating characteristic (ROC) from clinical medicine and demonstrate its application to sequence comparison. The ROC includes elements of both sensitivity and specificity, and is a quantitative measure of the usefulness of a diagnostic. The ROC is used in this work to investigate the effects of scoring table and gap penalties on database searches. Studies on three families of proteins, 4Fe-4S ferredoxins, lysR bacterial regulatory proteins, and bacterial RNA polymerase sigma-factors lead to the following conclusions: sequence families are quite idiosyncratic, but the best PAM distance for database searches using the Smith-Waterman method is somewhat larger than predicted by theoretical methods, about 200 PAM. The length independent gap penalty (gap initiation penalty) is quite important, but shows a broad peak at values of about 20-24. The length dependent gap penalty (gap extension penalty) is almost irrelevant suggesting that successful database searches rely only to a limited degree on gapped alignments. Taken together, these observations lead to the conclusion that the optimal conditions for alignments and database searches are not, and should not be expected to be, the same.
在本文中,我们借鉴了临床医学中接收器操作特征(ROC)的概念,并展示了其在序列比较中的应用。ROC包含敏感性和特异性两个要素,是对诊断有用性的一种定量衡量。在这项工作中,ROC用于研究评分表和空位罚分对数据库搜索的影响。对三类蛋白质家族,即4Fe-4S铁氧化还原蛋白、LysR细菌调节蛋白和细菌RNA聚合酶σ因子的研究得出以下结论:序列家族非常独特,但使用史密斯-沃特曼方法进行数据库搜索时的最佳PAM距离比理论方法预测的要大一些,约为200 PAM。与长度无关的空位罚分(空位起始罚分)非常重要,但在约20 - 24的值处呈现出一个宽峰。与长度相关的空位罚分(空位延伸罚分)几乎无关紧要,这表明成功的数据库搜索仅在有限程度上依赖于有缺口的比对。综上所述,这些观察结果得出结论,比对和数据库搜索的最佳条件并不相同,也不应期望它们相同。