• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

ProSE-Pero:基于自监督多任务语言预训练模型的过氧化物酶体蛋白定位识别模型。

ProSE-Pero: Peroxisomal Protein Localization Identification Model Based on Self-Supervised Multi-Task Language Pre-Training Model.

机构信息

School of Information Science and Engineering, University of Jinan, 250022 Jinan, Shandong, China.

Laboratory of Zoology, Graduate School of Bioresource and Bioenvironmental Sciences, Kyushu University, Fukuoka-shi, 819-0395 Fukuoka, Japan.

出版信息

Front Biosci (Landmark Ed). 2023 Dec 1;28(12):322. doi: 10.31083/j.fbl2812322.

DOI:10.31083/j.fbl2812322
PMID:38179735
Abstract

BACKGROUND

Peroxisomes are membrane-bound organelles that contain one or more types of oxidative enzymes. Aberrant localization of peroxisomal proteins can contribute to the development of various diseases. To more accurately identify and locate peroxisomal proteins, we developed the ProSE-Pero model.

METHODS

We employed three methods based on deep representation learning models to extract the characteristics of peroxisomal proteins and compared their performance. Furthermore, we used the SVMSMOTE balanced dataset, SHAP interpretation model, variance analysis (ANOVA), and light gradient boosting machine (LightGBM) to select and compare the extracted features. We also constructed several traditional machine learning methods and four deep learning models to train and test our model on a dataset of 160 peroxisomal proteins using tenfold cross-validation.

RESULTS

Our proposed ProSE-Pero model achieves high performance with a specificity (Sp) of 93.37%, a sensitivity (Sn) of 82.41%, an accuracy (Acc) of 95.77%, a Matthews correlation coefficient (MCC) of 0.8241, an F1 score of 0.8996, and an area under the curve (AUC) of 0.9818. Additionally, we extended our method to identify plant vacuole proteins and achieved an accuracy of 91.90% on the independent test set, which is approximately 5% higher than the latest iPVP-DRLF model.

CONCLUSIONS

Our model surpasses the existing In-Pero model in terms of peroxisomal protein localization and identification. Additionally, our study showcases the proficient performance of the pre-trained multitasking language model ProSE in extracting features from protein sequences. With its established validity and broad generalization, our model holds considerable potential for expanding its application to the localization and identification of proteins in other organelles, such as mitochondria and Golgi proteins, in future investigations.

摘要

背景

过氧化物酶体是一种含有一种或多种氧化酶的膜结合细胞器。过氧化物酶体蛋白的异常定位可能导致各种疾病的发生。为了更准确地识别和定位过氧化物酶体蛋白,我们开发了 ProSE-Pero 模型。

方法

我们采用了三种基于深度表示学习模型的方法来提取过氧化物酶体蛋白的特征,并比较了它们的性能。此外,我们使用了 SVMSMOTE 平衡数据集、SHAP 解释模型、方差分析(ANOVA)和轻梯度提升机(LightGBM)来选择和比较提取的特征。我们还构建了几种传统机器学习方法和四个深度学习模型,在一个包含 160 个过氧化物酶体蛋白的数据集上使用十折交叉验证对我们的模型进行训练和测试。

结果

我们提出的 ProSE-Pero 模型具有较高的性能,特异性(Sp)为 93.37%,敏感性(Sn)为 82.41%,准确性(Acc)为 95.77%,马修斯相关系数(MCC)为 0.8241,F1 分数为 0.8996,曲线下面积(AUC)为 0.9818。此外,我们将我们的方法扩展到识别植物液泡蛋白,并在独立测试集上获得了 91.90%的准确率,比最新的 iPVP-DRLF 模型高约 5%。

结论

我们的模型在过氧化物酶体蛋白定位和识别方面优于现有的 In-Pero 模型。此外,我们的研究展示了预训练的多任务语言模型 ProSE 在从蛋白质序列中提取特征方面的出色表现。我们的模型具有较高的有效性和广泛的泛化能力,在未来的研究中,有望将其应用于其他细胞器(如线粒体和高尔基体蛋白)的定位和识别。

相似文献

1
ProSE-Pero: Peroxisomal Protein Localization Identification Model Based on Self-Supervised Multi-Task Language Pre-Training Model.ProSE-Pero:基于自监督多任务语言预训练模型的过氧化物酶体蛋白定位识别模型。
Front Biosci (Landmark Ed). 2023 Dec 1;28(12):322. doi: 10.31083/j.fbl2812322.
2
Identification of plant vacuole proteins by exploiting deep representation learning features.利用深度表征学习特征鉴定植物液泡蛋白
Comput Struct Biotechnol J. 2022 Jun 8;20:2921-2927. doi: 10.1016/j.csbj.2022.06.002. eCollection 2022.
3
In-Pero: Exploiting Deep Learning Embeddings of Protein Sequences to Predict the Localisation of Peroxisomal Proteins.In-Pero:利用蛋白质序列的深度学习嵌入来预测过氧化物酶体蛋白的定位。
Int J Mol Sci. 2021 Jun 15;22(12):6409. doi: 10.3390/ijms22126409.
4
prPred-DRLF: Plant R protein predictor using deep representation learning features.prPred-DRLF:基于深度表示学习特征的植物 R 蛋白预测器。
Proteomics. 2022 Jan;22(1-2):e2100161. doi: 10.1002/pmic.202100161. Epub 2021 Oct 14.
5
Computational Approaches for Peroxisomal Protein Localization.计算方法用于过氧化物酶体蛋白定位。
Methods Mol Biol. 2023;2643:405-411. doi: 10.1007/978-1-0716-3048-8_29.
6
Can Predictive Modeling Tools Identify Patients at High Risk of Prolonged Opioid Use After ACL Reconstruction?预测模型工具能否识别 ACL 重建术后阿片类药物使用时间延长的高风险患者?
Clin Orthop Relat Res. 2020 Jul;478(7):0-1618. doi: 10.1097/CORR.0000000000001251.
7
Application of machine learning model in predicting the likelihood of blood transfusion after hip fracture surgery.机器学习模型在预测髋部骨折手术后输血可能性中的应用。
Aging Clin Exp Res. 2023 Nov;35(11):2643-2656. doi: 10.1007/s40520-023-02550-4. Epub 2023 Sep 21.
8
PeNGaRoo, a combined gradient boosting and ensemble learning framework for predicting non-classical secreted proteins.PeNGaRoo,一种组合梯度提升和集成学习框架,用于预测非经典分泌蛋白。
Bioinformatics. 2020 Feb 1;36(3):704-712. doi: 10.1093/bioinformatics/btz629.
9
Sequence-based discovery of the human and rodent peroxisomal proteome.基于序列的人类和啮齿动物过氧化物酶体蛋白质组发现。
Appl Bioinformatics. 2005;4(2):93-104. doi: 10.2165/00822942-200504020-00003.
10
DeepSSPred: A Deep Learning Based Sulfenylation Site Predictor Via a Novel nSegmented Optimize Federated Feature Encoder.DeepSSPred:一种基于深度学习的新型 nSegmented Optimize 联邦特征编码器的硫化位点预测器。
Protein Pept Lett. 2021;28(6):708-721. doi: 10.2174/0929866527666201202103411.