Suppr超能文献

MLCPP 2.0:更新的细胞穿透肽及其摄取效率预测器。

MLCPP 2.0: An Updated Cell-penetrating Peptides and Their Uptake Efficiency Predictor.

机构信息

Computational Biology and Bioinformatics Lab, Department of Integrative Biotechnology, College of Biotechnology & Bioengineering, Sungkyunkwan University, Seobu-ro, Jangan-gu, Suwon-si, Gyeonggi-do 16419, Republic of Korea.

Arontier Co., 241 Gangnam-daero, Seocho-gu, Seoul 06735, Republic of Korea.

出版信息

J Mol Biol. 2022 Jun 15;434(11):167604. doi: 10.1016/j.jmb.2022.167604. Epub 2022 Apr 28.

Abstract

Cell-penetrating peptides (CPPs) translocate into the cell as various biologically active conjugates and possess numerous biomedical applications. Several machine learning-based predictors have been proposed in the past, but they mostly focus on identifying only CPPs. We proposed a two-layered predictor in 2018 in order to predict CPPs and their uptake efficiency simultaneously. While MLCPP has gained widespread access to research, further improvements are needed to enhance its practical application. A new version of MLCPP is presented in this study called MLCPP 2.0, an interpretable stacking model that identifies CPPs and their strength of uptake efficiency. We updated the benchmarking dataset, explored 17 different sequence-based feature encoding algorithms, and used seven different conventional machine learning classifiers. With multiple 10-fold cross-validation, we constructed 119 baseline models whose predicted probability values were merged and treated as a new feature vector. In a systematic way, a feature set and a classifier are identified that are optimal for predicting the CPP and uptake efficiency separately. The MLCPP 2.0 model achieved outstanding performance on the independent test set, significantly outperforming the existing state-of-the-art predictors. Hence, we expect that our proposed MLCPP 2.0 will facilitate the design of hypothesis-driven experiments by enabling the discovery of novel CPPs. MLCPP 2.0 is freely accessible at https://balalab-skku.org/mlcpp2/.

摘要

细胞穿透肽(CPPs)作为各种具有生物活性的缀合物进入细胞,并具有许多医学应用。过去已经提出了几种基于机器学习的预测器,但它们主要集中于识别 CPPs。我们在 2018 年提出了一个双层预测器,以便同时预测 CPPs 和它们的摄取效率。虽然 MLCPP 已经广泛应用于研究,但仍需要进一步改进以增强其实际应用。本研究提出了一个名为 MLCPP 2.0 的新版本,这是一个可解释的堆叠模型,用于识别 CPPs 及其摄取效率的强度。我们更新了基准数据集,探索了 17 种不同的基于序列的特征编码算法,并使用了 7 种不同的传统机器学习分类器。通过多次 10 倍交叉验证,我们构建了 119 个基线模型,其预测概率值被合并并作为新的特征向量进行处理。系统地确定了用于分别预测 CPP 和摄取效率的最佳特征集和分类器。MLCPP 2.0 模型在独立测试集上表现出色,明显优于现有的最先进的预测器。因此,我们期望我们提出的 MLCPP 2.0 将通过发现新的 CPP 来促进基于假设的实验设计。MLCPP 2.0 可在 https://balalab-skku.org/mlcpp2/ 上免费获取。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验