信号-3L:一种预测信号肽的三层方法。

Signal-3L: A 3-layer approach for predicting signal peptides.

作者信息

Shen Hong-Bin, Chou Kuo-Chen

机构信息

Gordon Life Science Institute, San Diego, CA 92130, USA.

出版信息

Biochem Biophys Res Commun. 2007 Nov 16;363(2):297-303. doi: 10.1016/j.bbrc.2007.08.140. Epub 2007 Aug 31.

Abstract

Functioning as an "address tag" that directs nascent proteins to their proper cellular and extracellular locations, signal peptides have become a crucial tool in finding new drugs or reprogramming cells for gene therapy. To effectively and timely use such a tool, however, the first important thing is to develop an automated method for rapidly and accurately identifying the signal peptide for a given nascent protein. With the avalanche of new protein sequences generated in the post-genomic era, the challenge has become even more urgent and critical. In this paper, we have developed a novel method for predicting signal peptide sequences and their cleavage sites in human, plant, animal, eukaryotic, Gram-positive, and Gram-negative protein sequences, respectively. The new predictor is called Signal-3L that consists of three prediction engines working, respectively, for the following three progressively deepening layers: (1) identifying a query protein as secretory or non-secretory by an ensemble classifier formed by fusing many individual OET-KNN (optimized evidence-theoretic K nearest neighbor) classifiers operated in various dimensions of PseAA (pseudo amino acid) composition spaces; (2) selecting a set of candidates for the possible signal peptide cleavage sites of a query secretory protein by a subsite-coupled discrimination algorithm; (3) determining the final cleavage site by fusing the global sequence alignment outcome for each of the aforementioned candidates through a voting system. Signal-3L is featured by high success prediction rates with short computational time, and hence is particularly useful for the analysis of large-scale datasets. Signal-3L is freely available as a web-server at http://chou.med.harvard.edu/bioinf/Signal-3L/ or http://202.120.37.186/bioinf/Signal-3L, where, to further support the demand of the related areas, the signal peptides identified by Signal-3L for all the protein entries in Swiss-Prot databank that do not have signal peptide annotations or are annotated with uncertain terms but are classified by Signal-3L as secretory proteins are provided in a downloadable file. The large-scale file is prepared with Microsoft Excel and named "Tab-Signal-3L.xls", and will be updated once a year to include new protein entries and reflect the continuous development of Signal-3L.

摘要

信号肽起着“地址标签”的作用,将新生蛋白质引导至其合适的细胞内和细胞外位置,已成为寻找新药或对细胞进行基因治疗重编程的关键工具。然而,要有效且及时地使用这一工具,首要之事是开发一种自动化方法,用于快速、准确地识别给定新生蛋白质的信号肽。随着后基因组时代产生的新蛋白质序列雪崩式增长,这一挑战变得更加紧迫和关键。在本文中,我们分别开发了一种预测人类、植物、动物、真核生物、革兰氏阳性菌和革兰氏阴性菌蛋白质序列中信号肽序列及其切割位点的新方法。这种新的预测器名为Signal - 3L,它由三个预测引擎组成,分别针对以下三个逐步深入的层面工作:(1) 通过融合在伪氨基酸(PseAA)组成空间的各个维度上运行的多个个体OET - KNN(优化证据理论K近邻)分类器形成的集成分类器,将查询蛋白质识别为分泌型或非分泌型;(2) 通过子位点耦合判别算法为查询分泌型蛋白质的可能信号肽切割位点选择一组候选位点;(3) 通过投票系统融合上述每个候选位点的全局序列比对结果来确定最终切割位点。Signal - 3L的特点是预测成功率高且计算时间短,因此对于大规模数据集的分析特别有用。Signal - 3L可作为网络服务器免费获取,网址为http://chou.med.harvard.edu/bioinf/Signal - 3L/ 或http://202.120.37.186/bioinf/Signal - 3L,在那里,为进一步支持相关领域的需求,Signal - 3L为瑞士蛋白质数据库(Swiss - Prot databank)中所有没有信号肽注释或注释不确定但被Signal - 3L分类为分泌型蛋白质的蛋白质条目所识别的信号肽,以可下载文件的形式提供。这个大规模文件是用微软Excel编写的,名为“Tab - Signal - 3L.xls”,并且每年更新一次,以纳入新的蛋白质条目并反映Signal - 3L的持续发展。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索