Rodi Diane J, Mandava Suneeta, Makowski Lee
Combinatorial Biology Unit, Biosciences Division, Argonne National Laboratory, 9700 South Cass Avenue, Argonne, IL 60439, USA.
Bioinformatics. 2004 Dec 12;20(18):3481-9. doi: 10.1093/bioinformatics/bth432. Epub 2004 Jul 29.
Multiple alignments of proteins are an effective way of identifying conserved amino acids that provide clues to functional relationships among proteins. Quantitation of the abundances of amino acids found at each position in a sequence motif can provide a basis for understanding the structural and functional constraints at each point. Distribution of information across a motif has been used previously, but the non-intuitive nature of the analysis has limited its impact.
Here, we introduce a quantitative measure of amino acid sequence diversity (DIVAA) that has a simple, intuitive meaning. Diversity, as a measure of sequence conservation or variation, is inextricably linked to the probability of selecting identical pairs from a distribution. We demonstrate its utility through the analysis of four populations: ATP-binding P-loops, hypervariable domains of kappa light chains, signal sequences, and the N- and C- termini of proteins. DIVAA provides a simple means to generate hypotheses concerning the contribution of individual residues to the functional and evolutionary relationships among proteins.
Access to DIVAA software is available at RELIC (http://relic.bio.anl.gov).
蛋白质的多序列比对是识别保守氨基酸的有效方法,这些保守氨基酸为蛋白质之间的功能关系提供线索。对序列基序中每个位置发现的氨基酸丰度进行定量,可以为理解每个位点的结构和功能限制提供基础。信息在基序中的分布此前已被使用,但分析的非直观性质限制了其影响。
在这里,我们引入了一种氨基酸序列多样性的定量度量(DIVAA),它具有简单、直观的含义。作为序列保守性或变异性的度量,多样性与从分布中选择相同对的概率有着千丝万缕的联系。我们通过对四个群体的分析证明了它的实用性:ATP结合P环、κ轻链的高变区、信号序列以及蛋白质的N端和C端。DIVAA提供了一种简单的方法来生成关于单个残基对蛋白质之间功能和进化关系贡献的假设。