使用进化计算对蛋白质序列的序列长度模式进行自动推导和优化。

Automated derivation and refinement of sequence length patterns for protein sequences using evolutionary computation.

作者信息

Sadowski M I, Parish J H, Westhead D R

机构信息

Bioinformatics Unit, Department of Computer Science, University College London, Gower Street, London WC1E 6BT, UK.

出版信息

Biosystems. 2005 Sep;81(3):247-54. doi: 10.1016/j.biosystems.2005.05.001.

DOI:10.1016/j.biosystems.2005.05.001

PMID:16076522

Abstract

Several stratagems are used in protein bioinformatics for the classification of proteins based on sequence, structure or function. We explore the concept of a minimal signature embedded in a sequence that defines the likely position of a protein in a classification. Specifically, we address the derivation of sparse profiles for the G-protein coupled receptor (GPCR) clan of integral membrane proteins. We present an evolutionary algorithm (EA) for the derivation of sparse profiles (signatures) without the need to supply a multiple alignment. We also apply an evolution strategy (ES) to the problem of pattern and profile refinement. Patterns were derived for the GPCR 'superfamily' and GPCR families 1-3 individually from starting populations of randomly generated signatures, using a database of integral membrane protein sequences and an objective function using a modified receiver operator characteristic (ROC) statistic. The signature derived for the family 1 GPCR sequences was shown to perform very well in a stringent cross-validation test, detecting 76% of unseen GPCR sequences at 5% error. Application of the ES refinement method to a signature developed by a previously described method [Sadowski, M.I., Parish, J.H., 2003. Automated generation and refinement of protein signatures: case study with G-protein coupled receptors. Bioinformatics 19, 727-734] resulted in a 6% increase of coverage for 5% error as measured in the validation test. We note that there might be a limit to this or any classification of proteins based on patterns or schemata.

摘要

在蛋白质生物信息学中，有几种策略可用于根据序列、结构或功能对蛋白质进行分类。我们探索了序列中嵌入的最小特征的概念，该特征定义了蛋白质在分类中可能的位置。具体而言，我们研究了整合膜蛋白的G蛋白偶联受体（GPCR）家族的稀疏特征的推导。我们提出了一种进化算法（EA），用于推导稀疏特征（签名），而无需提供多序列比对。我们还将进化策略（ES）应用于模式和特征细化问题。使用整合膜蛋白序列数据库和使用修正的接收者操作特征（ROC）统计量的目标函数，从随机生成的特征的起始群体中分别为GPCR“超家族”和GPCR家族1-3推导模式。在严格的交叉验证测试中，为家族1 GPCR序列推导的特征表现非常出色，在5%的错误率下检测到76%的未见过的GPCR序列。将ES细化方法应用于先前描述的方法[Sadowski, M.I., Parish, J.H., 2003. 蛋白质特征的自动生成和细化：G蛋白偶联受体的案例研究。生物信息学19, 727-734]开发的特征，在验证测试中，5%错误率下的覆盖率提高了6%。我们注意到，基于模式或图式对蛋白质进行这种或任何分类可能存在局限性。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

使用进化计算对蛋白质序列的序列长度模式进行自动推导和优化。

Automated derivation and refinement of sequence length patterns for protein sequences using evolutionary computation.

作者信息

机构信息

出版信息

相似文献

使用进化计算对蛋白质序列的序列长度模式进行自动推导和优化。

Automated derivation and refinement of sequence length patterns for protein sequences using evolutionary computation.

作者信息

机构信息

出版信息

相似文献