Suppr超能文献

使用氨基酸序列的原子组成和全局特征进行蛋白质定位预测。

Protein location prediction using atomic composition and global features of the amino acid sequence.

机构信息

Centre for Bioinformatics, University of Kerala, Kariyavattom Campus, Thiruvananthapuram, Kerala, India.

出版信息

Biochem Biophys Res Commun. 2010 Jan 22;391(4):1670-4. doi: 10.1016/j.bbrc.2009.12.118. Epub 2009 Dec 28.

Abstract

Subcellular location of protein is constructive information in determining its function, screening for drug candidates, vaccine design, annotation of gene products and in selecting relevant proteins for further studies. Computational prediction of subcellular localization deals with predicting the location of a protein from its amino acid sequence. For a computational localization prediction method to be more accurate, it should exploit all possible relevant biological features that contribute to the subcellular localization. In this work, we extracted the biological features from the full length protein sequence to incorporate more biological information. A new biological feature, distribution of atomic composition is effectively used with, multiple physiochemical properties, amino acid composition, three part amino acid composition, and sequence similarity for predicting the subcellular location of the protein. Support Vector Machines are designed for four modules and prediction is made by a weighted voting system. Our system makes prediction with an accuracy of 100, 82.47, 88.81 for self-consistency test, jackknife test and independent data test respectively. Our results provide evidence that the prediction based on the biological features derived from the full length amino acid sequence gives better accuracy than those derived from N-terminal alone. Considering the features as a distribution within the entire sequence will bring out underlying property distribution to a greater detail to enhance the prediction accuracy.

摘要

蛋白质的亚细胞定位是确定其功能、筛选药物候选物、疫苗设计、基因产物注释以及选择相关蛋白质进行进一步研究的重要信息。蛋白质亚细胞定位的计算预测是指根据其氨基酸序列预测蛋白质的位置。为了使计算定位预测方法更加准确,它应该利用所有可能的相关生物学特征,这些特征有助于亚细胞定位。在这项工作中,我们从全长蛋白质序列中提取生物学特征,以纳入更多的生物学信息。一种新的生物特征,即原子组成分布,与多种物理化学性质、氨基酸组成、三部分氨基酸组成和序列相似性有效地结合在一起,用于预测蛋白质的亚细胞定位。支持向量机设计了四个模块,并通过加权投票系统进行预测。我们的系统在自我一致性测试、jackknife 测试和独立数据测试中分别以 100%、82.47%和 88.81%的准确率进行预测。我们的结果表明,基于全长氨基酸序列推导的生物特征进行预测比基于 N 端序列推导的预测具有更高的准确性。考虑到特征在整个序列中的分布,将更详细地揭示潜在的属性分布,从而提高预测准确性。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验