Suppr超能文献

基于统计接触势和小波变换的特征提取用于预测革兰氏阴性细菌蛋白质的亚细胞定位

Feature extraction by statistical contact potentials and wavelet transform for predicting subcellular localizations in gram negative bacterial proteins.

作者信息

Arango-Argoty G A, Jaramillo-Garzón J A, Castellanos-Domínguez G

机构信息

Signal Processing and Recognition Group, Universidad Nacional de Colombia, s. Manizales, Campus La Nubia, km 7 via al Magdalena, Manizales, Colombia; Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, 3501 Fifth Ave, Pittsburgh, PA 15260, USA.

Signal Processing and Recognition Group, Universidad Nacional de Colombia, s. Manizales, Campus La Nubia, km 7 via al Magdalena, Manizales, Colombia; Research Center of the Instituto Tecnologico Metropolitano, Calle 73 No 76A-354, Medellín, Colombia.

出版信息

J Theor Biol. 2015 Jan 7;364:121-30. doi: 10.1016/j.jtbi.2014.08.051. Epub 2014 Sep 16.

Abstract

Predicting the localization of a protein has become a useful practice for inferring its function. Most of the reported methods to predict subcellular localizations in Gram-negative bacterial proteins make use of standard protein representations that generally do not take into account the distribution of the amino acids and the structural information of the proteins. Here, we propose a protein representation based on the structural information contained in the pairwise statistical contact potentials. The wavelet transform decodes the information contained in the primary structure of the proteins, allowing the identification of patterns along the proteins, which are used to characterize the subcellular localizations. Then, a support vector machine classifier is trained to categorize them. Cellular compartments like periplasm and extracellular medium are difficult to predict, having a high false negative rate. The wavelet-based method achieves an overall high performance while maintaining a low false negative rate, particularly, on "periplasm" and "extracellular medium". Our results suggest the proposed protein characterization is a useful alternative to representing and predicting protein sequences over the classical and cutting edge protein depictions.

摘要

预测蛋白质的定位已成为推断其功能的一种有用方法。大多数已报道的用于预测革兰氏阴性细菌蛋白质亚细胞定位的方法都使用标准蛋白质表示法,这些方法通常不考虑氨基酸的分布和蛋白质的结构信息。在此,我们提出一种基于成对统计接触势中所含结构信息的蛋白质表示法。小波变换对蛋白质一级结构中包含的信息进行解码,从而能够识别蛋白质上的模式,这些模式用于表征亚细胞定位。然后,训练支持向量机分类器对其进行分类。像周质和细胞外介质这样的细胞区室很难预测,假阴性率很高。基于小波的方法在保持低假阴性率的同时实现了整体高性能,特别是在“周质”和“细胞外介质”方面。我们的结果表明,所提出的蛋白质表征是一种有用的替代方法,可用于在经典和前沿蛋白质描述之上表示和预测蛋白质序列。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验