• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

IdentPMP:使用基于序列的学习模型鉴定植物中的兼职蛋白

IdentPMP: identification of moonlighting proteins in plants using sequence-based learning models.

作者信息

Liu Xinyi, Shen Yueyue, Zhang Youhua, Liu Fei, Ma Zhiyu, Yue Zhenyu, Yue Yi

机构信息

School of Information and Computer, Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei, Anhui, China.

出版信息

PeerJ. 2021 Aug 6;9:e11900. doi: 10.7717/peerj.11900. eCollection 2021.

DOI:10.7717/peerj.11900
PMID:34434652
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8351581/
Abstract

BACKGROUND

A moonlighting protein refers to a protein that can perform two or more functions. Since the current moonlighting protein prediction tools mainly focus on the proteins in animals and microorganisms, and there are differences in the cells and proteins between animals and plants, these may cause the existing tools to predict plant moonlighting proteins inaccurately. Hence, the availability of a benchmark data set and a prediction tool specific for plant moonlighting protein are necessary.

METHODS

This study used some protein feature classes from the data set constructed in house to develop a web-based prediction tool. In the beginning, we built a data set about plant protein and reduced redundant sequences. We then performed feature selection, feature normalization and feature dimensionality reduction on the training data. Next, machine learning methods for preliminary modeling were used to select feature classes that performed best in plant moonlighting protein prediction. This selected feature was incorporated into the final plant protein prediction tool. After that, we compared five machine learning methods and used grid searching to optimize parameters, and the most suitable method was chosen as the final model.

RESULTS

The prediction results indicated that the eXtreme Gradient Boosting (XGBoost) performed best, which was used as the algorithm to construct the prediction tool, called IdentPMP (Identification of Plant Moonlighting Proteins). The results of the independent test set shows that the area under the precision-recall curve (AUPRC) and the area under the receiver operating characteristic curve (AUC) of IdentPMP is 0.43 and 0.68, which are 19.44% (0.43 vs. 0.36) and 13.33% (0.68 vs. 0.60) higher than state-of-the-art non-plant specific methods, respectively. This further demonstrated that a benchmark data set and a plant-specific prediction tool was required for plant moonlighting protein studies. Finally, we implemented the tool into a web version, and users can use it freely through the URL: http://identpmp.aielab.net/.

摘要

背景

兼职蛋白是指能够执行两种或更多功能的蛋白质。由于当前的兼职蛋白预测工具主要集中于动物和微生物中的蛋白质,并且动植物之间的细胞和蛋白质存在差异,这可能导致现有工具对植物兼职蛋白的预测不准确。因此,需要一个基准数据集和专门针对植物兼职蛋白的预测工具。

方法

本研究使用了内部构建的数据集中的一些蛋白质特征类别来开发基于网络的预测工具。首先,我们构建了一个关于植物蛋白质的数据集并减少冗余序列。然后,我们对训练数据进行特征选择、特征归一化和特征降维。接下来,使用机器学习方法进行初步建模,以选择在植物兼职蛋白预测中表现最佳的特征类别。这个选定的特征被纳入最终的植物蛋白质预测工具中。之后,我们比较了五种机器学习方法并使用网格搜索来优化参数,选择最合适的方法作为最终模型。

结果

预测结果表明,极端梯度提升(XGBoost)表现最佳,被用作构建预测工具IdentPMP(植物兼职蛋白鉴定)的算法。独立测试集的结果表明,IdentPMP的精确召回率曲线下面积(AUPRC)和接收器操作特征曲线下面积(AUC)分别为0.43和0.68,分别比最先进的非植物特定方法高19.44%(0.43对0.36)和13.33%(0.68对0.60)。这进一步证明了植物兼职蛋白研究需要一个基准数据集和一个植物特定的预测工具。最后,我们将该工具实现为网络版本,用户可以通过URL:http://identpmp.aielab.net/免费使用它。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a6d0/8351581/92e0e32288e2/peerj-09-11900-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a6d0/8351581/148f73b0ce8b/peerj-09-11900-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a6d0/8351581/8c189a3440ca/peerj-09-11900-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a6d0/8351581/92e0e32288e2/peerj-09-11900-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a6d0/8351581/148f73b0ce8b/peerj-09-11900-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a6d0/8351581/8c189a3440ca/peerj-09-11900-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a6d0/8351581/92e0e32288e2/peerj-09-11900-g003.jpg

相似文献

1
IdentPMP: identification of moonlighting proteins in plants using sequence-based learning models.IdentPMP:使用基于序列的学习模型鉴定植物中的兼职蛋白
PeerJ. 2021 Aug 6;9:e11900. doi: 10.7717/peerj.11900. eCollection 2021.
2
Can Predictive Modeling Tools Identify Patients at High Risk of Prolonged Opioid Use After ACL Reconstruction?预测模型工具能否识别 ACL 重建术后阿片类药物使用时间延长的高风险患者?
Clin Orthop Relat Res. 2020 Jul;478(7):0-1618. doi: 10.1097/CORR.0000000000001251.
3
Explainable Machine Learning Techniques To Predict Amiodarone-Induced Thyroid Dysfunction Risk: Multicenter, Retrospective Study With External Validation.可解释机器学习技术预测胺碘酮诱导甲状腺功能障碍风险:多中心回顾性研究及外部验证。
J Med Internet Res. 2023 Feb 7;25:e43734. doi: 10.2196/43734.
4
Comparison and development of machine learning for thalidomide-induced peripheral neuropathy prediction of refractory Crohn's disease in Chinese population.比较和发展用于预测中国人群中塞来昔布诱导的周围神经病变的难治性克罗恩病的机器学习方法。
World J Gastroenterol. 2023 Jun 28;29(24):3855-3870. doi: 10.3748/wjg.v29.i24.3855.
5
Moonlighting protein prediction using physico-chemical and evolutional properties via machine learning methods.利用物理化学和进化特性通过机器学习方法进行兼职蛋白质预测。
BMC Bioinformatics. 2021 May 24;22(1):261. doi: 10.1186/s12859-021-04194-5.
6
Prediction Model of Osteonecrosis of the Femoral Head After Femoral Neck Fracture: Machine Learning-Based Development and Validation Study.股骨颈骨折后股骨头坏死的预测模型:基于机器学习的开发与验证研究
JMIR Med Inform. 2021 Nov 19;9(11):e30079. doi: 10.2196/30079.
7
Prediction model of obstructive sleep apnea-related hypertension: Machine learning-based development and interpretation study.阻塞性睡眠呼吸暂停相关性高血压的预测模型:基于机器学习的开发与解读研究
Front Cardiovasc Med. 2022 Dec 5;9:1042996. doi: 10.3389/fcvm.2022.1042996. eCollection 2022.
8
Preoperative prediction of vessel invasion in locally advanced gastric cancer based on computed tomography radiomics and machine learning.基于计算机断层扫描影像组学和机器学习的局部进展期胃癌血管侵犯术前预测
Oncol Lett. 2023 May 22;26(1):293. doi: 10.3892/ol.2023.13879. eCollection 2023 Jul.
9
Use of Multiprognostic Index Domain Scores, Clinical Data, and Machine Learning to Improve 12-Month Mortality Risk Prediction in Older Hospitalized Patients: Prospective Cohort Study.使用多预后指标领域评分、临床数据和机器学习提高老年住院患者 12 个月死亡率风险预测:前瞻性队列研究。
J Med Internet Res. 2021 Jun 21;23(6):e26139. doi: 10.2196/26139.
10
[Application of machine learning model based on XGBoost algorithm in early prediction of patients with acute severe pancreatitis].基于XGBoost算法的机器学习模型在急性重症胰腺炎患者早期预测中的应用
Zhonghua Wei Zhong Bing Ji Jiu Yi Xue. 2023 Apr;35(4):421-426. doi: 10.3760/cma.j.cn121430-20221019-00930.

引用本文的文献

1
Predictive modeling of moonlighting DNA-binding proteins.兼职DNA结合蛋白的预测建模
NAR Genom Bioinform. 2022 Dec 2;4(4):lqac091. doi: 10.1093/nargab/lqac091. eCollection 2022 Dec.
2
A method for identifying moonlighting proteins based on linear discriminant analysis and bagging-SVM.一种基于线性判别分析和装袋支持向量机的兼职蛋白识别方法。
Front Genet. 2022 Aug 15;13:963349. doi: 10.3389/fgene.2022.963349. eCollection 2022.

本文引用的文献

1
Moonlighting Proteins Shine New Light on Molecular Signaling Niches.兼职蛋白为分子信号微环境带来新见解。
Int J Mol Sci. 2021 Jan 29;22(3):1367. doi: 10.3390/ijms22031367.
2
Understanding protein multifunctionality: from short linear motifs to cellular functions.理解蛋白质多功能性:从短线性基序到细胞功能。
Cell Mol Life Sci. 2019 Nov;76(22):4407-4412. doi: 10.1007/s00018-019-03273-4. Epub 2019 Aug 20.
3
Acetyl-CoA carboxylase 1-dependent lipogenesis promotes autophagy downstream of AMPK.乙酰辅酶 A 羧化酶 1 依赖性脂肪生成促进 AMPK 下游的自噬。
J Biol Chem. 2019 Aug 9;294(32):12020-12039. doi: 10.1074/jbc.RA118.007020. Epub 2019 Jun 17.
4
iLearn: an integrated platform and meta-learner for feature engineering, machine-learning analysis and modeling of DNA, RNA and protein sequence data.iLearn:一个集成平台和元学习者,用于 DNA、RNA 和蛋白质序列数据的特征工程、机器学习分析和建模。
Brief Bioinform. 2020 May 21;21(3):1047-1057. doi: 10.1093/bib/bbz041.
5
PlantMP: a database for moonlighting plant proteins.PlantMP:一个具有多重功能的植物蛋白数据库。
Database (Oxford). 2019 Jan 1;2019. doi: 10.1093/database/baz050.
6
Downstream Targets of Cyclic Nucleotides in Plants.植物中环核苷酸的下游靶点
Front Plant Sci. 2018 Oct 1;9:1428. doi: 10.3389/fpls.2018.01428. eCollection 2018.
7
Identification of Moonlighting Proteins in Genomes Using Text Mining Techniques.利用文本挖掘技术鉴定基因组中的兼职蛋白。
Proteomics. 2018 Nov;18(21-22):e1800083. doi: 10.1002/pmic.201800083. Epub 2018 Oct 10.
8
Inhibition of Acetyl-CoA Carboxylase by Phosphorylation or the Inhibitor ND-654 Suppresses Lipogenesis and Hepatocellular Carcinoma.磷酸化或抑制剂 ND-654 抑制乙酰辅酶 A 羧化酶可抑制脂肪生成和肝癌。
Cell Metab. 2019 Jan 8;29(1):174-182.e5. doi: 10.1016/j.cmet.2018.08.020. Epub 2018 Sep 20.
9
Discovery of Novel Functional Centers With Rationally Designed Amino Acid Motifs.通过合理设计氨基酸基序发现新型功能中心
Comput Struct Biotechnol J. 2018 Feb 27;16:70-76. doi: 10.1016/j.csbj.2018.02.007. eCollection 2018.
10
Moonlighting Proteins and Their Role in the Control of Signaling Microenvironments, as Exemplified by cGMP and Phytosulfokine Receptor 1 (PSKR1).兼职蛋白及其在信号微环境控制中的作用,以环鸟苷酸和植物硫肽激素受体1(PSKR1)为例
Front Plant Sci. 2018 Mar 28;9:415. doi: 10.3389/fpls.2018.00415. eCollection 2018.