Dehzangi Abdollah, Sohrabi Sohrab, Heffernan Rhys, Sharma Alok, Lyons James, Paliwal Kuldip, Sattar Abdul
BMC Bioinformatics. 2015;16 Suppl 4(Suppl 4):S1. doi: 10.1186/1471-2105-16-S4-S1. Epub 2015 Feb 23.
The functioning of a protein relies on its location in the cell. Therefore, predicting protein subcellular localization is an important step towards protein function prediction. Recent studies have shown that relying on Gene Ontology (GO) for feature extraction can improve the prediction performance. However, for newly sequenced proteins, the GO is not available. Therefore, for these cases, the prediction performance of GO based methods degrade significantly.
In this study, we develop a method to effectively employ physicochemical and evolutionary-based information in the protein sequence. To do this, we propose segmentation based feature extraction method to explore potential discriminatory information based on physicochemical properties of the amino acids to tackle Gram-positive and Gram-negative subcellular localization. We explore our proposed feature extraction techniques using 10 attributes that have been experimentally selected among a wide range of physicochemical attributes. Finally by applying the Rotation Forest classification technique to our extracted features, we enhance Gram-positive and Gram-negative subcellular localization accuracies up to 3.4% better than previous studies which used GO for feature extraction.
By proposing segmentation based feature extraction method to explore potential discriminatory information based on physicochemical properties of the amino acids as well as using Rotation Forest classification technique, we are able to enhance the Gram-positive and Gram-negative subcellular localization prediction accuracies, significantly.
蛋白质的功能依赖于其在细胞中的位置。因此,预测蛋白质亚细胞定位是迈向蛋白质功能预测的重要一步。最近的研究表明,依靠基因本体论(GO)进行特征提取可以提高预测性能。然而,对于新测序的蛋白质,GO不可用。因此,在这些情况下,基于GO的方法的预测性能会显著下降。
在本研究中,我们开发了一种方法来有效利用蛋白质序列中基于物理化学和进化的信息。为此,我们提出了基于分割的特征提取方法,以基于氨基酸的物理化学性质探索潜在的区分信息,以解决革兰氏阳性和革兰氏阴性亚细胞定位问题。我们使用在广泛的物理化学属性中通过实验选择的10个属性来探索我们提出的特征提取技术。最后,通过将旋转森林分类技术应用于我们提取的特征,我们将革兰氏阳性和革兰氏阴性亚细胞定位的准确率提高到比以前使用GO进行特征提取的研究高出3.4%。
通过提出基于分割的特征提取方法来基于氨基酸的物理化学性质探索潜在的区分信息,并使用旋转森林分类技术,我们能够显著提高革兰氏阳性和革兰氏阴性亚细胞定位预测的准确率。