• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于集成学习的磷酸化位点检测特征选择

Ensemble learning-based feature selection for phosphorylation site detection.

作者信息

Liu Songbo, Cui Chengmin, Chen Huipeng, Liu Tong

机构信息

School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China.

Beijing Institute of Control Engineering, China Academy of Space Technology, Beijing, China.

出版信息

Front Genet. 2022 Oct 21;13:984068. doi: 10.3389/fgene.2022.984068. eCollection 2022.

DOI:10.3389/fgene.2022.984068
PMID:36338976
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9634105/
Abstract

SARS-COV-2 is prevalent all over the world, causing more than six million deaths and seriously affecting human health. At present, there is no specific drug against SARS-COV-2. Protein phosphorylation is an important way to understand the mechanism of SARS -COV-2 infection. It is often expensive and time-consuming to identify phosphorylation sites with specific modified residues through experiments. A method that uses machine learning to make predictions about them is proposed. As all the methods of extracting protein sequence features are knowledge-driven, these features may not be effective for detecting phosphorylation sites without a complete understanding of the mechanism of protein. Moreover, redundant features also have a great impact on the fitting degree of the model. To solve these problems, we propose a feature selection method based on ensemble learning, which firstly extracts protein sequence features based on knowledge, then quantifies the importance score of each feature based on data, and finally uses the subset of important features as the final features to predict phosphorylation sites.

摘要

严重急性呼吸综合征冠状病毒2(SARS-CoV-2)在全球广泛传播,已导致超过600万人死亡,严重影响人类健康。目前,尚无针对SARS-CoV-2的特效药物。蛋白质磷酸化是了解SARS-CoV-2感染机制的重要途径。通过实验鉴定具有特定修饰残基的磷酸化位点通常既昂贵又耗时。为此提出了一种利用机器学习对其进行预测的方法。由于所有提取蛋白质序列特征的方法都是知识驱动的,在没有完全理解蛋白质机制的情况下,这些特征可能对检测磷酸化位点无效。此外,冗余特征对模型的拟合度也有很大影响。为了解决这些问题,我们提出了一种基于集成学习的特征选择方法,该方法首先基于知识提取蛋白质序列特征,然后基于数据量化每个特征的重要性得分,最后使用重要特征的子集作为最终特征来预测磷酸化位点。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bceb/9634105/d69c1d6042ab/fgene-13-984068-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bceb/9634105/7703af0f8470/fgene-13-984068-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bceb/9634105/d0443e66a294/fgene-13-984068-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bceb/9634105/d69c1d6042ab/fgene-13-984068-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bceb/9634105/7703af0f8470/fgene-13-984068-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bceb/9634105/d0443e66a294/fgene-13-984068-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bceb/9634105/d69c1d6042ab/fgene-13-984068-g003.jpg

相似文献

1
Ensemble learning-based feature selection for phosphorylation site detection.基于集成学习的磷酸化位点检测特征选择
Front Genet. 2022 Oct 21;13:984068. doi: 10.3389/fgene.2022.984068. eCollection 2022.
2
Ensemble Learning-Based Feature Selection for Phage Protein Prediction.基于集成学习的噬菌体蛋白质预测特征选择
Front Microbiol. 2022 Jul 15;13:932661. doi: 10.3389/fmicb.2022.932661. eCollection 2022.
3
DE-MHAIPs: Identification of SARS-CoV-2 phosphorylation sites based on differential evolution multi-feature learning and multi-head attention mechanism.基于差分进化多特征学习和多头注意力机制的 SARS-CoV-2 磷酸化位点鉴定。
Comput Biol Med. 2023 Jun;160:106935. doi: 10.1016/j.compbiomed.2023.106935. Epub 2023 Apr 14.
4
Advancing the accuracy of SARS-CoV-2 phosphorylation site detection via meta-learning approach.通过元学习方法提高 SARS-CoV-2 磷酸化位点检测的准确性。
Brief Bioinform. 2023 Nov 22;25(1). doi: 10.1093/bib/bbad433.
5
Adaptive learning embedding features to improve the predictive performance of SARS-CoV-2 phosphorylation sites.自适应学习嵌入特征,以提高 SARS-CoV-2 磷酸化位点的预测性能。
Bioinformatics. 2023 Nov 1;39(11). doi: 10.1093/bioinformatics/btad627.
6
A new hybrid ensemble machine-learning model for severity risk assessment and post-COVID prediction system.一种新的混合集成机器学习模型,用于严重程度风险评估和 COVID 后预测系统。
Math Biosci Eng. 2022 Apr 13;19(6):6102-6123. doi: 10.3934/mbe.2022285.
7
ECAmyloid: An amyloid predictor based on ensemble learning and comprehensive sequence-derived features.ECAmyloid:一种基于集成学习和综合序列衍生特征的淀粉样蛋白预测器。
Comput Biol Chem. 2023 Jun;104:107853. doi: 10.1016/j.compbiolchem.2023.107853. Epub 2023 Mar 23.
8
Machine learning techniques for sequence-based prediction of viral-host interactions between SARS-CoV-2 and human proteins.基于序列的 SARS-CoV-2 与人类蛋白质之间病毒-宿主相互作用的预测的机器学习技术。
Biomed J. 2020 Oct;43(5):438-450. doi: 10.1016/j.bj.2020.08.003. Epub 2020 Sep 3.
9
APIS: accurate prediction of hot spots in protein interfaces by combining protrusion index with solvent accessibility.APIS:通过结合突出指数和溶剂可及性来准确预测蛋白质界面热点。
BMC Bioinformatics. 2010 Apr 8;11:174. doi: 10.1186/1471-2105-11-174.
10
An ensemble learning-based feature selection algorithm for identification of biomarkers of renal cell carcinoma.一种基于集成学习的用于识别肾细胞癌生物标志物的特征选择算法。
PeerJ Comput Sci. 2024 Jan 4;10:e1768. doi: 10.7717/peerj-cs.1768. eCollection 2024.

引用本文的文献

1
Empirical Comparison and Analysis of Artificial Intelligence-Based Methods for Identifying Phosphorylation Sites of SARS-CoV-2 Infection.基于人工智能的新冠病毒感染磷酸化位点识别方法的实证比较与分析
Int J Mol Sci. 2024 Dec 21;25(24):13674. doi: 10.3390/ijms252413674.
2
PhosBoost: Improved phosphorylation prediction recall using gradient boosting and protein language models.PhosBoost:使用梯度提升和蛋白质语言模型提高磷酸化预测召回率
Plant Direct. 2023 Dec 20;7(12):e554. doi: 10.1002/pld3.554. eCollection 2023 Dec.
3
Deep Learning in Phosphoproteomics: Methods and Application in Cancer Drug Discovery.

本文引用的文献

1
COVID-19 immune features revealed by a large-scale single-cell transcriptome atlas.大规模单细胞转录组图谱揭示的新冠病毒免疫特征
Cell. 2021 Nov 11;184(23):5838. doi: 10.1016/j.cell.2021.10.023.
2
Drug repositioning based on the heterogeneous information fusion graph convolutional network.基于异质信息融合图卷积网络的药物重定位。
Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab319.
3
Deep learning methods for biomedical named entity recognition: a survey and qualitative comparison.深度学习方法在生物医学命名实体识别中的应用:综述与定性比较。
磷酸化蛋白质组学中的深度学习:方法及其在癌症药物发现中的应用
Proteomes. 2023 May 2;11(2):16. doi: 10.3390/proteomes11020016.
Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab282.
4
DeepIPs: comprehensive assessment and computational identification of phosphorylation sites of SARS-CoV-2 infection using a deep learning-based approach.DeepIPs:基于深度学习的方法对 SARS-CoV-2 感染的磷酸化位点进行全面评估和计算识别。
Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab244.
5
Novel coronavirus 2019-nCoV (COVID-19): early estimation of epidemiological parameters and epidemic size estimates.新型冠状病毒 2019-nCoV (COVID-19):流行病学参数和疫情规模的早期估计。
Philos Trans R Soc Lond B Biol Sci. 2021 Jul 19;376(1829):20200265. doi: 10.1098/rstb.2020.0265. Epub 2021 May 31.
6
Genomic variation, origin tracing, and vaccine development of SARS-CoV-2: A systematic review.严重急性呼吸综合征冠状病毒2的基因组变异、溯源及疫苗研发:一项系统综述
Innovation (Camb). 2021 May 28;2(2):100116. doi: 10.1016/j.xinn.2021.100116. Epub 2021 May 11.
7
Identification and functional analysis of the SARS-COV-2 nucleocapsid protein.鉴定和功能分析 SARS-CoV-2 核衣壳蛋白。
BMC Microbiol. 2021 Feb 22;21(1):58. doi: 10.1186/s12866-021-02107-3.
8
Indicator Regularized Non-Negative Matrix Factorization Method-Based Drug Repurposing for COVID-19.基于指标正则化非负矩阵分解方法的 COVID-19 药物再利用。
Front Immunol. 2021 Jan 29;11:603615. doi: 10.3389/fimmu.2020.603615. eCollection 2020.
9
Anthem: a user customised tool for fast and accurate prediction of binding between peptides and HLA class I molecules. anthem:一种用户自定义工具,用于快速准确地预测肽段与 HLA Ⅰ类分子的结合。
Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbaa415.
10
ITP-Pred: an interpretable method for predicting, therapeutic peptides with fused features low-dimension representation.ITP-Pred:一种具有可解释性的方法,用于预测具有融合特征低维表示的治疗性肽。
Brief Bioinform. 2021 Jul 20;22(4). doi: 10.1093/bib/bbaa367.