Daeyaert F, Moereels H, Lewi P J
Center for Molecular Design, Janssen Research Foundation, Vosselaar, Belgium.
Comput Methods Programs Biomed. 1998 Jun;56(3):221-33. doi: 10.1016/s0169-2607(98)00031-5.
Unaligned amino acid sequences can be characterized by their composition of amino acid n-tuples (i.e. doublets, triplets, quadruplets, etc.). In this study we investigated the performance of two statistics, termed commonality and specificity, that are derived from n-tuple counts using a set of G-protein coupled receptor (GPCR) sequences. The commonality of a tuple is defined as its relative occurrence in the sequences that belong to a given GPCR subtype. The specificity of a tuple is derived from its relative occurrence in the sequences of a given GPCR subtype and from its relative non-occurrence in the sequences that do not belong to this subtype. A graphical presentation, termed 'polygram', is described for the visualization of common and specific tuples. The method can be applied to the classification of unknown GPCR sequences. It can also be applied to the identification of fragments of GPCRs, such as may occur in chimeric receptors. The method is generally applicable to other protein families and other types of coding.
未比对的氨基酸序列可以通过其氨基酸n元组(即二元组、三元组、四元组等)的组成来表征。在本研究中,我们使用一组G蛋白偶联受体(GPCR)序列,研究了从n元组计数中得出的两种统计量——共性和特异性——的性能。一个元组的共性定义为其在属于给定GPCR亚型的序列中的相对出现频率。一个元组的特异性源于其在给定GPCR亚型序列中的相对出现频率以及在不属于该亚型的序列中的相对未出现频率。描述了一种称为“多聚图”的图形表示法,用于可视化共性和特异性元组。该方法可应用于未知GPCR序列的分类。它也可应用于GPCR片段的鉴定,例如嵌合受体中可能出现的片段。该方法通常适用于其他蛋白质家族和其他类型的编码。