Klein P, Kanehisa M, DeLisi C
Biochim Biophys Acta. 1984 Jun 28;787(3):221-6. doi: 10.1016/0167-4838(84)90312-1.
The protein superfamilies in the National Biomedical Research Foundation sequence data base cluster into six groups that can be distinguished on the basis of four variables characterizing amino acid composition and local sequence properties. The variables are average hydrophobicity, net charge, sequence length and periodic variation in hydrophobic residues along the chain. The clusters they distinguish are: globins; chromosomal proteins; contractile system proteins and respiratory proteins other than cytochromes; enzyme inhibitors and toxins; enzymes except hydrolases; and all other proteins. The overall probability of correctly allocating a given protein to one of these functional groups is 0.76, with the allocation reliability being highest for globins (0.97) and for chromosomal proteins (0.93).
国家生物医学研究基金会序列数据库中的蛋白质超家族可分为六组,这六组可根据表征氨基酸组成和局部序列特性的四个变量加以区分。这些变量为平均疏水性、净电荷、序列长度以及沿链的疏水残基的周期性变化。它们所区分出的簇为:珠蛋白;染色体蛋白;收缩系统蛋白和除细胞色素外的呼吸蛋白;酶抑制剂和毒素;除水解酶外的酶;以及所有其他蛋白质。将某一给定蛋白质正确归入这些功能组之一的总体概率为0.76,其中珠蛋白(0.97)和染色体蛋白(0.93)的归类可靠性最高。