Suppr超能文献

结合CJ-SPHMM、TMHMM和PSORT的分泌蛋白预测系统。

Secreted protein prediction system combining CJ-SPHMM, TMHMM, and PSORT.

作者信息

Chen Yunjia, Yu Peng, Luo Jingchu, Jiang Ying

机构信息

College of Life Sciences, National Laboratory of Protein Engineering and Plant Genetic Engineering, and Centre of Bioinformatics, Peking University, Beijing 100871, China.

出版信息

Mamm Genome. 2003 Dec;14(12):859-65. doi: 10.1007/s00335-003-2296-6.

Abstract

To increase the coverage of secreted protein prediction, we describe a combination strategy. Instead of using a single method, we combine Hidden Markov Model (HMM)-based methods CJ-SPHMM and TMHMM with PSORT in secreted protein prediction. CJ-SPHMM is an HMM-based signal peptide prediction method, while TMHMM is an HMM-based transmembrane (TM) protein prediction algorithm. With CJ-SPHMM and TMHMM, proteins with predicted signal peptide and without predicted TM regions are taken as putative secreted proteins. This HMM-based approach predicts secreted protein with Ac (Accuracy) at 0.82 and Cc (Correlation coefficient) at 0.75, which are similar to PSORT with Ac at 0.82 and Cc at 0.76. When we further complement the HMM-based method, i.e., CJ-SPHMM + TMHMM with PSORT in secreted protein prediction, the Ac value is increased to 0.86 and the Cc value is increased to 0.81. Taking this combination strategy to search putative secreted proteins from the International Protein Index (IPI) maintained at the European Bioinformatics Institute (EBI), we constructed a putative human secretome with 5235 proteins. The prediction system described here can also be applied to predicting secreted proteins from other vertebrate proteomes.

摘要

为了提高分泌蛋白预测的覆盖率,我们描述了一种组合策略。我们在分泌蛋白预测中,不是使用单一方法,而是将基于隐马尔可夫模型(HMM)的方法CJ-SPHMM和TMHMM与PSORT相结合。CJ-SPHMM是一种基于HMM的信号肽预测方法,而TMHMM是一种基于HMM的跨膜(TM)蛋白预测算法。利用CJ-SPHMM和TMHMM,将预测有信号肽且无预测跨膜区域的蛋白质作为假定的分泌蛋白。这种基于HMM的方法预测分泌蛋白的准确率(Ac)为0.82,相关系数(Cc)为0.75,这与PSORT的准确率0.82和相关系数0.76相似。当我们在分泌蛋白预测中用PSORT进一步补充基于HMM的方法,即CJ-SPHMM + TMHMM时,Ac值提高到0.86,Cc值提高到0.81。采用这种组合策略从欧洲生物信息学研究所(EBI)维护的国际蛋白质索引(IPI)中搜索假定的分泌蛋白,我们构建了一个包含5235种蛋白质的假定人类分泌蛋白质组。这里描述的预测系统也可应用于预测其他脊椎动物蛋白质组中的分泌蛋白。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验