Suppr超能文献

目标 CPP:使用梯度提升决策树从优化的多尺度特征中准确预测细胞穿透肽。

TargetCPP: accurate prediction of cell-penetrating peptides from optimized multi-scale features using gradient boost decision tree.

机构信息

School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, 210094, China.

出版信息

J Comput Aided Mol Des. 2020 Aug;34(8):841-856. doi: 10.1007/s10822-020-00307-z. Epub 2020 Mar 16.

Abstract

Cell-penetrating peptides (CPPs) are short length permeable proteins have emerged as drugs delivery tool of therapeutic agents including genetic materials and macromolecules into cells. Recently, CPP has become a hotspot avenue for life science research and paved a new way of disease treatment without harmful impact on cell viability due to nontoxic characteristic. Therefore, the correct identification of CPPs will provide hints for medical applications. Considering the shortcomings of traditional experimental CPPs identification, it is urgently needed to design intelligent predictor for accurate identification of CPPs for the large scale uncharacterized sequences. We develop a novel computational method, called TargetCPP, to discriminate CPPs from Non-CPPs with improved accuracy. In TargetCPP, first the peptide sequences are formulated with four distinct encoding methods i.e., composite protein sequence representation, composition transition and distribution, split amino acid composition, and information theory features. These dominant feature vectors were fused and applied intelligent minimum redundancy and maximum relevancy feature selection method to choose an optimal subset of features. Finally, the predictive model is learned through different classification algorithms on the optimized features. Among these classifiers, gradient boost decision tree algorithm achieved excellent performance throughout the experiments. Notably, the TargetCPP tool attained high prediction Accuracy of 93.54% and 88.28% using jackknife and independent test, respectively. Empirical outcomes prove the superiority and potency of proposed bioinformatics method over state-of-the-art methods. It is highly anticipated that the outcomes of this study will provide a strong background for large scale prediction of CPPs and instructive guidance in clinical therapy and medical applications.

摘要

细胞穿透肽(CPPs)是一种短长度的可渗透蛋白质,已成为将治疗剂(包括遗传物质和大分子)递送到细胞内的药物输送工具。最近,CPP 已成为生命科学研究的热点领域,由于其无毒特性,为疾病治疗开辟了新途径,而不会对细胞活力产生有害影响。因此,正确识别 CPP 将为医学应用提供线索。考虑到传统实验 CPP 识别的缺点,迫切需要设计智能预测器,以便对大规模未表征的序列进行准确的 CPP 识别。我们开发了一种新的计算方法,称为 TargetCPP,可提高准确性来区分 CPP 和非 CPP。在 TargetCPP 中,首先使用四种不同的编码方法对肽序列进行了描述,即复合蛋白质序列表示、组成转换和分布、分裂氨基酸组成和信息理论特征。将这些主要特征向量融合,并应用智能最小冗余和最大相关性特征选择方法来选择最佳特征子集。最后,通过不同的分类算法在优化的特征上学习预测模型。在这些分类器中,梯度提升决策树算法在整个实验中表现出优异的性能。值得注意的是,TargetCPP 工具在 jackknife 和独立测试中分别实现了 93.54%和 88.28%的高预测精度。实验结果证明了所提出的生物信息学方法优于最先进方法的优越性和有效性。预计这项研究的结果将为 CPP 的大规模预测提供强有力的背景,并为临床治疗和医学应用提供有指导意义的建议。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验