Suppr超能文献

使用通用评分方案评估分子序列特征统计显著性的方法。

Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes.

作者信息

Karlin S, Altschul S F

机构信息

Department of Mathematics, Stanford University, CA 94305.

出版信息

Proc Natl Acad Sci U S A. 1990 Mar;87(6):2264-8. doi: 10.1073/pnas.87.6.2264.

Abstract

An unusual pattern in a nucleic acid or protein sequence or a region of strong similarity shared by two or more sequences may have biological significance. It is therefore desirable to know whether such a pattern can have arisen simply by chance. To identify interesting sequence patterns, appropriate scoring values can be assigned to the individual residues of a single sequence or to sets of residues when several sequences are compared. For single sequences, such scores can reflect biophysical properties such as charge, volume, hydrophobicity, or secondary structure potential; for multiple sequences, they can reflect nucleotide or amino acid similarity measured in a wide variety of ways. Using an appropriate random model, we present a theory that provides precise numerical formulas for assessing the statistical significance of any region with high aggregate score. A second class of results describes the composition of high-scoring segments. In certain contexts, these permit the choice of scoring systems which are "optimal" for distinguishing biologically relevant patterns. Examples are given of applications of the theory to a variety of protein sequences, highlighting segments with unusual biological features. These include distinctive charge regions in transcription factors and protooncogene products, pronounced hydrophobic segments in various receptor and transport proteins, and statistically significant subalignments involving the recently characterized cystic fibrosis gene.

摘要

核酸或蛋白质序列中的异常模式,或两个或多个序列共有的高度相似区域,可能具有生物学意义。因此,有必要了解这样的模式是否可能仅仅是偶然出现的。为了识别有趣的序列模式,可以为单个序列的各个残基或在比较多个序列时为残基集分配适当的得分值。对于单个序列,此类得分可以反映生物物理性质,如电荷、体积、疏水性或二级结构潜力;对于多个序列,它们可以反映以多种方式测量的核苷酸或氨基酸相似性。使用适当的随机模型,我们提出了一种理论,该理论提供了精确的数值公式,用于评估任何具有高总分区域的统计显著性。第二类结果描述了高分片段的组成。在某些情况下,这些结果允许选择对于区分生物学相关模式“最优”的评分系统。文中给出了该理论应用于各种蛋白质序列的示例,突出了具有异常生物学特征的片段。这些包括转录因子和原癌基因产物中的独特电荷区域、各种受体和转运蛋白中明显的疏水片段,以及涉及最近鉴定的囊性纤维化基因的具有统计学意义的子比对。

相似文献

3
Statistical studies of biomolecular sequences: score-based methods.生物分子序列的统计研究:基于分数的方法。
Philos Trans R Soc Lond B Biol Sci. 1994 Jun 29;344(1310):391-402. doi: 10.1098/rstb.1994.0078.
7
On the statistical significance of nucleic acid similarities.论核酸相似性的统计学意义。
Nucleic Acids Res. 1984 Jan 11;12(1 Pt 1):215-26. doi: 10.1093/nar/12.1part1.215.
8
Definition and identification of homology domains.同源结构域的定义与鉴定
Comput Appl Biosci. 1988 Mar;4(1):25-33. doi: 10.1093/bioinformatics/4.1.25.
10
Significance of gapped sequence alignments.缺口序列比对的意义。
J Comput Biol. 2008 Nov;15(9):1187-94. doi: 10.1089/cmb.2008.0125.

引用本文的文献

3
Alignment-free viral sequence classification at scale.大规模无比对病毒序列分类
BMC Genomics. 2025 Apr 18;26(1):389. doi: 10.1186/s12864-025-11554-5.
5
Alignment-Free Viral Sequence Classification at Scale.大规模无比对病毒序列分类
bioRxiv. 2024 Dec 11:2024.12.10.627186. doi: 10.1101/2024.12.10.627186.
6

本文引用的文献

2
Random sequences.随机序列
J Mol Biol. 1983 Jan 15;163(2):171-6. doi: 10.1016/0022-2836(83)90002-5.
3
New approaches for computer analysis of nucleic acid sequences.核酸序列计算机分析的新方法。
Proc Natl Acad Sci U S A. 1983 Sep;80(18):5660-4. doi: 10.1073/pnas.80.18.5660.
9
On the PAM matrix model of protein evolution.关于蛋白质进化的PAM矩阵模型。
Mol Biol Evol. 1985 Sep;2(5):434-47. doi: 10.1093/oxfordjournals.molbev.a040360.
10
Structure and sequence of the Drosophila zeste gene.果蝇zeste基因的结构与序列
EMBO J. 1987 Mar;6(3):791-9. doi: 10.1002/j.1460-2075.1987.tb04821.x.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验