Nakai K, Kanehisa M
Institute for Chemical Research, Kyoto University, Japan.
Proteins. 1991;11(2):95-110. doi: 10.1002/prot.340110203.
We have developed an expert system that makes use of various kinds of knowledge organized as "if-then" rules for predicting protein localization sites in Gram-negative bacteria, given the amino acid sequence information alone. We considered four localization sites: the cytoplasm, the inner (cytoplasmic) membrane, the periplasm, and the outer membrane. Most rules were derived from experimental observations. For example, the rule to recognize an inner membrane protein is the presence of either a hydrophobic stretch in the predicted mature protein or an uncleavable N-terminal signal sequence. Lipoproteins are first recognized by a consensus pattern and then assumed present at either the inner or outer membrane. These two possibilities are further discriminated by examining an acidic residue in the mature N-terminal portion. Furthermore, we found an empirical rule that periplasmic and outer membrane proteins were successfully discriminated by their different amino acid composition. Overall, our system could predict 83% of the localization sites of proteins in our database.
我们开发了一个专家系统,该系统利用以“如果-那么”规则组织的各种知识,仅根据氨基酸序列信息来预测革兰氏阴性菌中的蛋白质定位位点。我们考虑了四个定位位点:细胞质、内膜(细胞质膜)、周质和外膜。大多数规则源自实验观察结果。例如,识别内膜蛋白的规则是预测的成熟蛋白中存在疏水片段或不可切割的N端信号序列。脂蛋白首先通过共有模式识别,然后假定存在于内膜或外膜。通过检查成熟N端部分的酸性残基可进一步区分这两种可能性。此外,我们发现了一条经验规则,即周质蛋白和外膜蛋白可通过它们不同的氨基酸组成成功区分。总体而言,我们的系统能够预测数据库中83%的蛋白质定位位点。