Suppr超能文献

一种具有混合特征的集成方法用于识别细胞外基质蛋白。

An ensemble method with hybrid features to identify extracellular matrix proteins.

作者信息

Yang Runtao, Zhang Chengjin, Gao Rui, Zhang Lina

机构信息

School of Control Science and Engineering, Shandong University, Jinan, China.

School of Control Science and Engineering, Shandong University, Jinan, China; School of Mechanical, Electrical and Information Engineering, Shandong University at Weihai, China.

出版信息

PLoS One. 2015 Feb 13;10(2):e0117804. doi: 10.1371/journal.pone.0117804. eCollection 2015.

Abstract

The extracellular matrix (ECM) is a dynamic composite of secreted proteins that play important roles in numerous biological processes such as tissue morphogenesis, differentiation and homeostasis. Furthermore, various diseases are caused by the dysfunction of ECM proteins. Therefore, identifying these important ECM proteins may assist in understanding related biological processes and drug development. In view of the serious imbalance in the training dataset, a Random Forest-based ensemble method with hybrid features is developed in this paper to identify ECM proteins. Hybrid features are employed by incorporating sequence composition, physicochemical properties, evolutionary and structural information. The Information Gain Ratio and Incremental Feature Selection (IGR-IFS) methods are adopted to select the optimal features. Finally, the resulting predictor termed IECMP (Identify ECM Proteins) achieves an balanced accuracy of 86.4% using the 10-fold cross-validation on the training dataset, which is much higher than results obtained by other methods (ECMPRED: 71.0%, ECMPP: 77.8%). Moreover, when tested on a common independent dataset, our method also achieves significantly improved performance over ECMPP and ECMPRED. These results indicate that IECMP is an effective method for ECM protein prediction, which has a more balanced prediction capability for positive and negative samples. It is anticipated that the proposed method will provide significant information to fully decipher the molecular mechanisms of ECM-related biological processes and discover candidate drug targets. For public access, we develop a user-friendly web server for ECM protein identification that is freely accessible at http://iecmp.weka.cc.

摘要

细胞外基质(ECM)是一种由分泌蛋白组成的动态复合物,在组织形态发生、分化和体内平衡等众多生物学过程中发挥着重要作用。此外,各种疾病是由ECM蛋白功能障碍引起的。因此,识别这些重要的ECM蛋白可能有助于理解相关生物学过程和药物开发。鉴于训练数据集严重失衡,本文开发了一种基于随机森林的具有混合特征的集成方法来识别ECM蛋白。通过整合序列组成、理化性质、进化和结构信息来使用混合特征。采用信息增益比和增量特征选择(IGR-IFS)方法来选择最优特征。最后,所得到的预测器称为IECMP(识别ECM蛋白),在训练数据集上使用10折交叉验证时达到了86.4%的平衡准确率,远高于其他方法所获得的结果(ECMPRED:71.0%,ECMPP:77.8%)。此外,在一个常见的独立数据集上进行测试时,我们的方法在性能上也显著优于ECMPP和ECMPRED。这些结果表明IECMP是一种用于ECM蛋白预测的有效方法,对正负样本具有更平衡的预测能力。预计所提出的方法将提供重要信息,以全面解读与ECM相关生物学过程的分子机制并发现候选药物靶点。为了便于公众使用,我们开发了一个用户友好的用于ECM蛋白识别的网络服务器,可通过http://iecmp.weka.cc免费访问。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/963f/4334504/fc3e3025764b/pone.0117804.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验