Suppr超能文献

关于蛋白质属性预测和伪氨基酸组成的一些说明。

Some remarks on protein attribute prediction and pseudo amino acid composition.

机构信息

Gordon Life Science Institute, 13784 Torrey Del Mar Drive, San Diego, CA 92130, USA.

出版信息

J Theor Biol. 2011 Mar 21;273(1):236-47. doi: 10.1016/j.jtbi.2010.12.024. Epub 2010 Dec 17.

Abstract

With the accomplishment of human genome sequencing, the number of sequence-known proteins has increased explosively. In contrast, the pace is much slower in determining their biological attributes. As a consequence, the gap between sequence-known proteins and attribute-known proteins has become increasingly large. The unbalanced situation, which has critically limited our ability to timely utilize the newly discovered proteins for basic research and drug development, has called for developing computational methods or high-throughput automated tools for fast and reliably identifying various attributes of uncharacterized proteins based on their sequence information alone. Actually, during the last two decades or so, many methods in this regard have been established in hope to bridge such a gap. In the course of developing these methods, the following things were often needed to consider: (1) benchmark dataset construction, (2) protein sample formulation, (3) operating algorithm (or engine), (4) anticipated accuracy, and (5) web-server establishment. In this review, we are to discuss each of the five procedures, with a special focus on the introduction of pseudo amino acid composition (PseAAC), its different modes and applications as well as its recent development, particularly in how to use the general formulation of PseAAC to reflect the core and essential features that are deeply hidden in complicated protein sequences.

摘要

随着人类基因组测序的完成,已知序列的蛋白质数量呈爆炸式增长。相比之下,确定其生物学属性的速度要慢得多。因此,已知序列蛋白质和已知属性蛋白质之间的差距越来越大。这种不平衡的情况严重限制了我们及时利用新发现的蛋白质进行基础研究和药物开发的能力,因此需要开发计算方法或高通量自动化工具,以便仅根据序列信息快速可靠地识别未鉴定蛋白质的各种属性。实际上,在过去的二十年左右的时间里,已经建立了许多这方面的方法,希望能够弥合这一差距。在开发这些方法的过程中,通常需要考虑以下几点:(1)基准数据集的构建,(2)蛋白质样本的制定,(3)操作算法(或引擎),(4)预期的准确性,以及(5)网络服务器的建立。在这篇综述中,我们将讨论这五个步骤,特别关注伪氨基酸组成(PseAAC)的介绍,其不同模式和应用以及它的最新发展,特别是如何使用 PseAAC 的通用公式来反映隐藏在复杂蛋白质序列中的核心和基本特征。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ec6f/7125570/069db89523a4/gr1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验