Suppr超能文献

利用旋转森林和基于物理化学的特征进行革兰氏阳性和革兰氏阴性亚细胞定位

Gram-positive and Gram-negative subcellular localization using rotation forest and physicochemical-based features.

作者信息

Dehzangi Abdollah, Sohrabi Sohrab, Heffernan Rhys, Sharma Alok, Lyons James, Paliwal Kuldip, Sattar Abdul

出版信息

BMC Bioinformatics. 2015;16 Suppl 4(Suppl 4):S1. doi: 10.1186/1471-2105-16-S4-S1. Epub 2015 Feb 23.

Abstract

BACKGROUND

The functioning of a protein relies on its location in the cell. Therefore, predicting protein subcellular localization is an important step towards protein function prediction. Recent studies have shown that relying on Gene Ontology (GO) for feature extraction can improve the prediction performance. However, for newly sequenced proteins, the GO is not available. Therefore, for these cases, the prediction performance of GO based methods degrade significantly.

RESULTS

In this study, we develop a method to effectively employ physicochemical and evolutionary-based information in the protein sequence. To do this, we propose segmentation based feature extraction method to explore potential discriminatory information based on physicochemical properties of the amino acids to tackle Gram-positive and Gram-negative subcellular localization. We explore our proposed feature extraction techniques using 10 attributes that have been experimentally selected among a wide range of physicochemical attributes. Finally by applying the Rotation Forest classification technique to our extracted features, we enhance Gram-positive and Gram-negative subcellular localization accuracies up to 3.4% better than previous studies which used GO for feature extraction.

CONCLUSION

By proposing segmentation based feature extraction method to explore potential discriminatory information based on physicochemical properties of the amino acids as well as using Rotation Forest classification technique, we are able to enhance the Gram-positive and Gram-negative subcellular localization prediction accuracies, significantly.

摘要

背景

蛋白质的功能依赖于其在细胞中的位置。因此,预测蛋白质亚细胞定位是迈向蛋白质功能预测的重要一步。最近的研究表明,依靠基因本体论(GO)进行特征提取可以提高预测性能。然而,对于新测序的蛋白质,GO不可用。因此,在这些情况下,基于GO的方法的预测性能会显著下降。

结果

在本研究中,我们开发了一种方法来有效利用蛋白质序列中基于物理化学和进化的信息。为此,我们提出了基于分割的特征提取方法,以基于氨基酸的物理化学性质探索潜在的区分信息,以解决革兰氏阳性和革兰氏阴性亚细胞定位问题。我们使用在广泛的物理化学属性中通过实验选择的10个属性来探索我们提出的特征提取技术。最后,通过将旋转森林分类技术应用于我们提取的特征,我们将革兰氏阳性和革兰氏阴性亚细胞定位的准确率提高到比以前使用GO进行特征提取的研究高出3.4%。

结论

通过提出基于分割的特征提取方法来基于氨基酸的物理化学性质探索潜在的区分信息,并使用旋转森林分类技术,我们能够显著提高革兰氏阳性和革兰氏阴性亚细胞定位预测的准确率。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9171/4347615/d185cd7cca3e/1471-2105-16-S4-S1-1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验