Suppr超能文献

Some useful statistical properties of position-weight matrices.

作者信息

Claverie J M

机构信息

National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894.

出版信息

Comput Chem. 1994 Sep;18(3):287-94. doi: 10.1016/0097-8485(94)85024-0.

Abstract

Position-weight matrices (or profiles) are simple mathematical objects traditionally used to capture the information about local sequence patterns (or motifs) characteristic of a given structure or function. Although weight matrices can lead to fast database scanning algorithms their usage has been limited, due to the lack of a reliable method to assess the statistical significance of the matching scores. In this article I first review 3 different computation scheme for designing weight matrices from a block-alignment of any (small or large) number of sequences. I then show that, for patterns spanning 10 positions or more, the best scores expected from matching random sequences are distributed according to the extreme value (Gumbel) distribution. The threshold of statistical significance assessed from this distribution perfectly delineate the range of scores characterizing "true positive" sequences (biological significant matches). This result allows weight matrices to be used to scan an entire protein database for patterns in a highly sensitive way. MODEST (MOtif DEsign and Search Tools), a suite of programs in Unix/C, implements these statistical improvements and is available upon E-mail request (jmc@ncbi.nlm.nih.gov).

摘要

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验