Suppr超能文献

用于合并基序匹配分数的方法和统计

Methods and statistics for combining motif match scores.

作者信息

Bailey T L, Gribskov M

机构信息

San Diego Supercomputer Center, California 92186-9784, USA.

出版信息

J Comput Biol. 1998 Summer;5(2):211-21. doi: 10.1089/cmb.1998.5.211.

Abstract

Position-specific scoring matrices are useful for representing and searching for protein sequence motifs. A sequence family can often be described by a group of one or more motifs, and an effective search must combine the scores for matching a sequence to each of the motifs in the group. We describe three methods for combining match scores and estimating the statistical significance of the combined scores and evaluate the search quality (classification accuracy) and the accuracy of the estimate of statistical significance of each. The three methods are: 1) sum of scores, 2) sum of reduced variates, 3) product of score p-values. We show that method 3) is superior to the other two methods in both regards, and that combining motif scores indeed gives better search accuracy. The MAST sequence homology search algorithm utilizing the product of p-values scoring method is available for interactive use and downloading at URL http:/(/)www.sdsc.edu/MEME.

摘要

特定位置计分矩阵对于表示和搜索蛋白质序列基序很有用。一个序列家族通常可以由一组一个或多个基序来描述,而有效的搜索必须将一个序列与该组中每个基序匹配的分数组合起来。我们描述了三种组合匹配分数并估计组合分数统计显著性的方法,并评估了每种方法的搜索质量(分类准确性)和统计显著性估计的准确性。这三种方法是:1)分数总和,2)约化变量总和,3)分数p值的乘积。我们表明,在这两个方面,方法3)都优于其他两种方法,并且组合基序分数确实能提供更好的搜索准确性。利用p值乘积计分方法的MAST序列同源性搜索算法可在网址http:/(/)www.sdsc.edu/MEME上进行交互式使用和下载。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验