Poisson Guylaine, Chauve Cedric, Chen Xin, Bergeron Anne
Department of Information and Computer Sciences, University of Hawaii at Manoa, Honolulu, HI 96822, USA.
Genomics Proteomics Bioinformatics. 2007 May;5(2):121-30. doi: 10.1016/S1672-0229(07)60022-9.
A glycosylphosphatidylinositol (GPI) anchor is a common but complex C-terminal post-translational modification of extracellular proteins in eukaryotes. Here we investigate the problem of correctly annotating GPI-anchored proteins for the growing number of sequences in public databases. We developed a computational system, called FragAnchor, based on the tandem use of a neural network (NN) and a hidden Markov model (HMM). Firstly, NN selects potential GPI-anchored proteins in a dataset, then HMM parses these potential GPI signals and refines the prediction by qualitative scoring. FragAnchor correctly predicted 91% of all the GPI-anchored proteins annotated in the Swiss-Prot database. In a large-scale analysis of 29 eukaryote proteomes, FragAnchor predicted that the percentage of highly probable GPI-anchored proteins is between 0.21% and 2.01%. The distinctive feature of FragAnchor, compared with other systems, is that it targets only the C-terminus of a protein, making it less sensitive to the background noise found in databases and possible incomplete protein sequences. Moreover, FragAnchor can be used to predict GPI-anchored proteins in all eukaryotes. Finally, by using qualitative scoring, the predictions combine both sensitivity and information content. The predictor is publicly available at [see text].
糖基磷脂酰肌醇(GPI)锚定是真核生物中细胞外蛋白常见但复杂的C端翻译后修饰。在此,我们针对公共数据库中不断增加的序列,研究正确注释GPI锚定蛋白的问题。我们开发了一个名为FragAnchor的计算系统,它基于神经网络(NN)和隐马尔可夫模型(HMM)的串联使用。首先,NN在数据集中选择潜在的GPI锚定蛋白,然后HMM解析这些潜在的GPI信号,并通过定性评分完善预测。FragAnchor正确预测了Swiss-Prot数据库中注释的所有GPI锚定蛋白的91%。在对29个真核生物蛋白质组的大规模分析中,FragAnchor预测高度可能的GPI锚定蛋白的百分比在0.21%至2.01%之间。与其他系统相比,FragAnchor的独特之处在于它仅针对蛋白质的C端,从而降低了对数据库中背景噪声和可能不完整蛋白质序列的敏感性。此外,FragAnchor可用于预测所有真核生物中的GPI锚定蛋白。最后,通过使用定性评分,预测结果兼具敏感性和信息含量。该预测工具可在[见正文]公开获取。