DEIS, IEIIT-CNR, Università di Bologna, Viale Risorgimento 2, 40136 Bologna, Italy.
J Theor Biol. 2010 Jun 7;264(3):1024-32. doi: 10.1016/j.jtbi.2010.03.020. Epub 2010 Mar 20.
In this work, we propose a method for protein classification that combines different texture descriptors extracted from the 2-D distance matrix obtained from the 3-D tertiary structure of a given protein. Instead of considering all atoms in the protein, the distance matrix is calculated by considering only those atoms that belong to the protein backbone. The positive results reported in this paper offer further experimental confirmation that the distance matrix contains sufficient information for describing a protein. Moreover, we show that combining features extracted from the primary structure with features extracted from the distance matrix increases the performance of our classification system. We demonstrate this finding by comparing the performance of an ensemble of classifiers that uses the combined features. The classifiers used in our experiments are support vector machines and random subspace of support vector machines. The experimental results, validated using three different datasets (protein fold recognition, DNA-binding proteins recognition, biological processes, and molecular functions recognition) along with different texture feature extraction methods (variants of local binary patterns, Radon feature transform based approaches, and Haralick descriptors) demonstrate the effectiveness of the proposed approach. Particularly interesting are the results in the classification of 27 types of structural properties: our proposed approach achieves significant improvement compared with other reported methods.
在这项工作中,我们提出了一种蛋白质分类方法,该方法结合了从给定蛋白质的三维三级结构获得的二维距离矩阵中提取的不同纹理描述符。与考虑蛋白质中的所有原子不同,距离矩阵是通过仅考虑属于蛋白质骨架的那些原子来计算的。本文报道的积极结果进一步证实了距离矩阵包含描述蛋白质的足够信息。此外,我们表明,结合从一级结构中提取的特征与从距离矩阵中提取的特征可以提高我们的分类系统的性能。我们通过比较使用组合特征的分类器的集合的性能来证明这一发现。我们实验中使用的分类器是支持向量机和支持向量机的随机子空间。使用三个不同的数据集(蛋白质折叠识别、DNA 结合蛋白识别、生物过程和分子功能识别)以及不同的纹理特征提取方法(局部二值模式的变体、基于 Radon 特征变换的方法和 Haralick 描述符)验证了实验结果,证明了所提出方法的有效性。特别有趣的是 27 种结构特性的分类结果:与其他报道的方法相比,我们提出的方法取得了显著的改进。