Suppr超能文献

Statistical significance of ungapped sequence alignments.

作者信息

Alexandrov N N, Solovyev V V

机构信息

Amgen Inc, Thousand Oaks, CA, USA.

出版信息

Pac Symp Biocomput. 1998:463-72.

PMID:9697204
Abstract

Statistical significance of a local sequence alignment depends not only on the similarity score and on the sequence lengths, but also on a length of the alignment. Dependence of the alignment significance on the length of the sequences has been analyzed earlier, and is based on the idea that the longer sequences have more chances to share a local similarity with a bigger score. To the best of our knowledge, a dependence of the statistical significance on the length of an alignment has not been used in selecting the best alignments. We have applied to real proteins formulas for assessing the statistical significance of ungapped local alignments. Let L be a length of the alignment, then the expected value of a similarity score is Sexp = * L, where is the expected similarity between two randomly chosen residues. Value of can be calculated from a similarity (substitution) matrix M and amino acid frequencies P. = sigma ij pipjmij. The probability of observing a score S greater than or equal to x for an alignment of length L is given by the normal distribution: Prob(S > or = x) = 1-integral of N ((S-Sexp)/sigma) = 1-integral of N((S-*L)/sigma m square root of L), where sigma m is a standard deviation of m. From these formula, we conclude, that we should select the best alignment using a normalized value of the similarity score as follows: S' = max ¿(S-*L)/ sigma m square root of L¿. The proposed normalization of the similarity score has been tested on the representative benchmark. To evaluate a performance of the normalization, we have calculated several measures of the recognition quality. Our normalization has improved all these measures. This procedure is important for choosing the correct alignment for homology modelling as well as for selecting distantly related sequences in databases.

摘要

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验