Suppr超能文献

通过特征选择提高预测二硫键连接的准确性。

Improving the accuracy of predicting disulfide connectivity by feature selection.

机构信息

Department of Bioinformatics, Institute of Image Processing & Pattern Recognition, Shanghai Jiaotong University, 800 Dongchuan Road, Shanghai 200240, China.

出版信息

J Comput Chem. 2010 May;31(7):1478-85. doi: 10.1002/jcc.21433.

Abstract

Disulfide bonds are primary covalent cross-links formed between two cysteine residues in the same or different protein polypeptide chains, which play important roles in the folding and stability of proteins. However, computational prediction of disulfide connectivity directly from protein primary sequences is challenging due to the nonlocal nature of disulfide bonds in the context of sequences, and the number of possible disulfide patterns grows exponentially when the number of cysteine residues increases. In the previous studies, disulfide connectivity prediction was usually performed in high-dimensional feature space, which can cause a variety of problems in statistical learning, such as the dimension disaster, overfitting, and feature redundancy. In this study, we propose an efficient feature selection technique for analyzing the importance of each feature component. On the basis of this approach, we selected the most important features for predicting the connectivity pattern of intra-chain disulfide bonds. Our results have shown that the high-dimensional features contain redundant information, and the prediction performance can be further improved when these high-dimensional features are reduced to a lower but more compact dimensional space. Our results also indicate that the global protein features contribute little to the formation and prediction of disulfide bonds, while the local sequential and structural information play important roles. All these findings provide important insights for structural studies of disulfide-rich proteins.

摘要

二硫键是在同一或不同蛋白质多肽链的两个半胱氨酸残基之间形成的主要共价交联,在蛋白质的折叠和稳定性中发挥重要作用。然而,由于二硫键在序列背景下的非局部性质,以及当半胱氨酸残基数增加时,可能的二硫键模式数量呈指数增长,因此直接从蛋白质一级序列预测二硫键连接性具有挑战性。在以前的研究中,二硫键连接性预测通常在高维特征空间中进行,这可能会导致统计学习中的各种问题,例如维度灾难、过拟合和特征冗余。在这项研究中,我们提出了一种有效的特征选择技术,用于分析每个特征分量的重要性。在此基础上,我们选择了预测链内二硫键连接模式的最重要特征。我们的结果表明,高维特征包含冗余信息,当将这些高维特征降低到更低但更紧凑的维度空间时,预测性能可以进一步提高。我们的结果还表明,全局蛋白质特征对二硫键的形成和预测贡献不大,而局部序列和结构信息则起着重要作用。所有这些发现为富含二硫键的蛋白质的结构研究提供了重要的见解。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验