Suppr超能文献

预测二硫键连接模式。

Predicting disulfide connectivity patterns.

作者信息

Lu Chih-Hao, Chen Yu-Ching, Yu Chin-Sheng, Hwang Jenn-Kang

机构信息

Institute of Bioinformatics, National Chiao Tung University, Hsinchu 30050, Taiwan.

出版信息

Proteins. 2007 May 1;67(2):262-70. doi: 10.1002/prot.21309.

Abstract

Disulfide bonds play an important role in stabilizing protein structure and regulating protein function. Therefore, the ability to infer disulfide connectivity from protein sequences will be valuable in structural modeling and functional analysis. However, to predict disulfide connectivity directly from sequences presents a challenge to computational biologists due to the nonlocal nature of disulfide bonds, i.e., the close spatial proximity of the cysteine pair that forms the disulfide bond does not necessarily imply the short sequence separation of the cysteine residues. Recently, Chen and Hwang (Proteins 2005;61:507-512) treated this problem as a multiple class classification by defining each distinct disulfide pattern as a class. They used multiple support vector machines based on a variety of sequence features to predict the disulfide patterns. Their results compare favorably with those in the literature for a benchmark dataset sharing less than 30% sequence identity. However, since the number of disulfide patterns grows rapidly when the number of disulfide bonds increases, their method performs unsatisfactorily for the cases of large number of disulfide bonds. In this work, we propose a novel method to represent disulfide connectivity in terms of cysteine pairs, instead of disulfide patterns. Since the number of bonding states of the cysteine pairs is independent of that of disulfide bonds, the problem of class explosion is avoided. The bonding states of the cysteine pairs are predicted using the support vector machines together with the genetic algorithm optimization for feature selection. The complete disulfide patterns are then determined from the connectivity matrices that are constructed from the predicted bonding states of the cysteine pairs. Our approach outperforms the current approaches in the literature.

摘要

二硫键在稳定蛋白质结构和调节蛋白质功能方面发挥着重要作用。因此,从蛋白质序列推断二硫键连接性的能力在结构建模和功能分析中具有重要价值。然而,由于二硫键的非局部性质,即形成二硫键的半胱氨酸对在空间上紧密相邻并不一定意味着半胱氨酸残基在序列上的短距离分隔,直接从序列预测二硫键连接性对计算生物学家来说是一个挑战。最近,Chen和Hwang(《蛋白质》2005年;61:507 - 512)将此问题视为多类分类问题,通过将每种不同的二硫键模式定义为一类来处理。他们使用基于多种序列特征的多个支持向量机来预测二硫键模式。对于一个序列同一性小于30%的基准数据集,他们的结果与文献中的结果相比具有优势。然而,由于当二硫键数量增加时二硫键模式的数量迅速增长,他们的方法在二硫键数量较多的情况下表现不佳。在这项工作中,我们提出了一种新颖的方法,用半胱氨酸对来表示二硫键连接性,而不是二硫键模式。由于半胱氨酸对的结合状态数量与二硫键的数量无关,避免了类爆炸问题。使用支持向量机并结合遗传算法优化进行特征选择来预测半胱氨酸对的结合状态。然后根据由预测的半胱氨酸对结合状态构建的连接矩阵确定完整的二硫键模式。我们的方法优于文献中目前的方法。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验