Hazelrig J B, Jones M K, Segrest J P
Department of Biostatistics, University of Alabama, Birmingham Academic Health Sciences Center 35294.
Biophys J. 1993 Jun;64(6):1827-32. doi: 10.1016/S0006-3495(93)81553-4.
Multiple amphipathic alpha-helical candidate domains have been identified in exchangeable apolipoproteins by sequence analysis and indirect experimental evidence. The distribution of charged residues can differ within and between these apolipoproteins. Segrest et al. (Segrest, J. P., H. DeLoof, J. G. Dohlman, C. G. Brouillette, and G. M. Anantharamaiah. 1990. Proteins. 8:103-117.) argued that these differences are correlated with lipid affinity. A mathematically defined motif for the particular charge distribution associated with high lipid affinity (class A) is proposed. Primary sequence data from protein segments proposed previously to have an amphipathic alpha-helical structure are scanned. Counting formulas are presented for determining the conditional probability that the match between an observed charge distribution and the proposed motif would occur by chance. Because the preselected helical segments are short (the modal length is 22) and the motif definition imposes multiple constraints on the acceptable distributions, the computer-based algorithm is quite feasible computationally. 19 of the 20 segments previously assigned to class A match the motif sufficiently well (the remaining one is borderline), while very few others "erroneously" pass the screening test. These results confirm the original assignments of the candidate domains and, thus, support the hypothesis that there is a distinguishable subset of helixes having high lipid affinity. This counting approach is applicable to a growing subset of protein sequence analysis problems in which the segment lengths are short and the motif is complex.
通过序列分析和间接实验证据,在可交换载脂蛋白中已鉴定出多个两亲性α-螺旋候选结构域。这些载脂蛋白内部以及相互之间带电残基的分布可能存在差异。塞格斯特等人(塞格斯特,J.P.,H.德洛夫,J.G.多尔曼,C.G.布鲁耶特,和G.M.阿南塔拉马亚。1990年。蛋白质。8:103 - 117。)认为这些差异与脂质亲和力相关。提出了一种与高脂质亲和力(A类)相关的特定电荷分布的数学定义基序。对先前提出具有两亲性α-螺旋结构的蛋白质片段的一级序列数据进行扫描。给出了计数公式,用于确定观察到的电荷分布与所提出基序之间的匹配偶然发生的条件概率。由于预先选择的螺旋片段较短(模式长度为22),并且基序定义对可接受的分布施加了多个约束,基于计算机的算法在计算上相当可行。先前分配到A类的20个片段中有19个与基序匹配得足够好(其余一个处于临界状态),而其他很少有片段“错误地”通过筛选测试。这些结果证实了候选结构域的原始分配,因此支持了存在具有高脂质亲和力的可区分螺旋子集的假设。这种计数方法适用于蛋白质序列分析问题中不断增加的一个子集,其中片段长度较短且基序复杂。