University of Padua, Padua.
IEEE/ACM Trans Comput Biol Bioinform. 2012;9(2):467-75. doi: 10.1109/TCBB.2011.117. Epub 2011 Aug 18.
The availability of a reliable prediction method for prediction of bacterial virulent proteins has several important applications in research efforts targeted aimed at finding novel drug targets, vaccine candidates, and understanding virulence mechanisms in pathogens. In this work, we have studied several feature extraction approaches for representing proteins and propose a novel bacterial virulent protein prediction method, based on an ensemble of classifiers where the features are extracted directly from the amino acid sequence and from the evolutionary information of a given protein. We have evaluated and compared several ensembles obtained by combining six feature extraction methods and several classification approaches based on two general purpose classifiers (i.e., Support Vector Machine and a variant of input decimated ensemble) and their random subspace version. An extensive evaluation was performed according to a blind testing protocol, where the parameters of the system are optimized using the training set and the system is validated in three different independent data sets, allowing selection of the most performing system and demonstrating the validity of the proposed method. Based on the results obtained using the blind test protocol, it is interesting to note that even if in each independent data set the most performing stand-alone method is not always the same, the fusion of different methods enhances prediction efficiency in all the tested independent data sets.
一种可靠的细菌毒力蛋白预测方法的可用性在研究工作中有几个重要的应用,旨在寻找新的药物靶点、疫苗候选物,并了解病原体的毒力机制。在这项工作中,我们研究了几种用于表示蛋白质的特征提取方法,并提出了一种新的细菌毒力蛋白预测方法,该方法基于分类器的集成,其中特征直接从氨基酸序列和给定蛋白质的进化信息中提取。我们评估和比较了通过结合六种特征提取方法和几种基于两种通用分类器(即支持向量机和输入稀疏集成的变体)及其随机子空间版本的分类方法获得的几种集成。根据盲测试协议进行了广泛的评估,其中使用训练集优化系统的参数,并在三个不同的独立数据集上验证系统,从而选择性能最佳的系统,并证明所提出方法的有效性。根据盲测试协议获得的结果,有趣的是,即使在每个独立数据集中,性能最佳的独立方法并不总是相同,但不同方法的融合提高了所有测试独立数据集中的预测效率。