TargetCLP：通过加权特征整合方法结合基于变换和进化尺度建模的多视图特征进行网格蛋白蛋白质预测。

TargetCLP: clathrin proteins prediction combining transformed and evolutionary scale modeling-based multi-view features via weighted feature integration approach.

作者信息

Ullah Matee, Akbar Shahid, Raza Ali, Khan Kashif Ahmad, Zou Quan

机构信息

Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, Sichuan 610054, China.

Department of Computer Science, Abdul Wali Khan University Mardan, Mardan 23200, Pakistan.

出版信息

Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbaf026.

DOI:10.1093/bib/bbaf026

PMID:39844339

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11753890/

Abstract

Clathrin proteins, key elements of the vesicle coat, play a crucial role in various cellular processes, including neural function, signal transduction, and endocytosis. Disruptions in clathrin protein functions have been associated with a wide range of diseases, such as Alzheimer's, neurodegeneration, viral infection, and cancer. Therefore, correctly identifying clathrin protein functions is critical to unravel the mechanism of these fatal diseases and designing drug targets. This paper presents a novel computational method, named TargetCLP, to precisely identify clathrin proteins. TargetCLP leverages four single-view feature representation methods, including two transformed feature sets (PSSM-CLBP and RECM-CLBP), one qualitative characteristics feature, and one deep-learned-based embedding using ESM. The single-view features are integrated based on their weights using differential evolution, and the BTG feature selection algorithm is utilized to generate a more optimal and reduced subset. The model is trained using various classifiers, among which the proposed SnBiLSTM achieved remarkable performance. Experimental and comparative results on both training and independent datasets show that the proposed TargetCLP offers significant improvements in terms of both prediction accuracy and generalization to unseen data, furthering advancements in the research field.

摘要

网格蛋白是囊泡衣被的关键成分，在包括神经功能、信号转导和内吞作用在内的各种细胞过程中发挥着至关重要的作用。网格蛋白功能的破坏与多种疾病有关，如阿尔茨海默病、神经退行性变、病毒感染和癌症。因此，正确识别网格蛋白的功能对于揭示这些致命疾病的发病机制和设计药物靶点至关重要。本文提出了一种名为TargetCLP的新型计算方法，用于精确识别网格蛋白。TargetCLP利用了四种单视图特征表示方法，包括两个变换后的特征集（PSSM-CLBP和RECM-CLBP）、一个定性特征和一个基于深度学习的使用ESM的嵌入。单视图特征基于其权重使用差分进化进行整合，并利用BTG特征选择算法生成一个更优且精简的子集。该模型使用各种分类器进行训练，其中所提出的SnBiLSTM取得了显著的性能。在训练数据集和独立数据集上的实验及对比结果表明，所提出的TargetCLP在预测准确性和对未见数据的泛化能力方面都有显著提高，推动了该研究领域的进展。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0092/11753890/c16480901fb4/bbaf026f1.jpg

相似文献

TargetCLP: clathrin proteins prediction combining transformed and evolutionary scale modeling-based multi-view features via weighted feature integration approach.TargetCLP：通过加权特征整合方法结合基于变换和进化尺度建模的多视图特征进行网格蛋白蛋白质预测。

Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbaf026.

Identification of clathrin proteins by incorporating hyperparameter optimization in deep learning and PSSM profiles.通过在深度学习和 PSSM 特征中加入超参数优化来鉴定网格蛋白蛋白。

Comput Methods Programs Biomed. 2019 Aug;177:81-88. doi: 10.1016/j.cmpb.2019.05.016. Epub 2019 May 17.

MoRF_ESM: Prediction of MoRFs in disordered proteins based on a deep transformer protein language model.MoRF_ESM：基于深度变压器蛋白质语言模型预测无序蛋白质中的分子识别特征片段

J Bioinform Comput Biol. 2024 Apr;22(2):2450006. doi: 10.1142/S0219720024500069. Epub 2024 May 28.

pNPs-CapsNet: Predicting Neuropeptides Using Protein Language Models and FastText Encoding-Based Weighted Multi-View Feature Integration with Deep Capsule Neural Network.pNPs-CapsNet：使用蛋白质语言模型和基于FastText编码的加权多视图特征集成与深度胶囊神经网络预测神经肽

ACS Omega. 2025 Mar 18;10(12):12403-12416. doi: 10.1021/acsomega.4c11449. eCollection 2025 Apr 1.

Glypred: Lysine Glycation Site Prediction via CCU-LightGBM-BiLSTM Framework with Multi-Head Attention Mechanism.Glypred：基于 CCU-LightGBM-BiLSTM 框架与多头注意力机制的赖氨酸糖基化位点预测

J Chem Inf Model. 2024 Aug 26;64(16):6699-6711. doi: 10.1021/acs.jcim.4c01034. Epub 2024 Aug 9.

HOTGpred: Enhancing human O-linked threonine glycosylation prediction using integrated pretrained protein language model-based features and multi-stage feature selection approach.HOTGpred：利用集成的预训练蛋白质语言模型为基础的特征和多阶段特征选择方法增强人类 O-连接 threonine 糖基化预测。

Comput Biol Med. 2024 Sep;179:108859. doi: 10.1016/j.compbiomed.2024.108859. Epub 2024 Jul 18.

Minimal mesoscale model for protein-mediated vesiculation in clathrin-dependent endocytosis.用于网格蛋白依赖内吞作用中蛋白介导的小泡形成的最小介观模型。

PLoS Comput Biol. 2010 Sep 9;6(9):e1000926. doi: 10.1371/journal.pcbi.1000926.

Deep-WET: a deep learning-based approach for predicting DNA-binding proteins using word embedding techniques with weighted features.深度WET：一种基于深度学习的方法，利用带加权特征的词嵌入技术预测DNA结合蛋白。

Sci Rep. 2024 Feb 5;14(1):2961. doi: 10.1038/s41598-024-52653-9.

ESVM: evolutionary support vector machine for automatic feature selection and classification of microarray data.ESVM：用于微阵列数据自动特征选择与分类的进化支持向量机

Biosystems. 2007 Sep-Oct;90(2):516-28. doi: 10.1016/j.biosystems.2006.12.003. Epub 2006 Dec 16.

Minimalist ensemble algorithms for genome-wide protein localization prediction.基因组范围内蛋白质定位预测的简约集成算法。

BMC Bioinformatics. 2012 Jul 3;13:157. doi: 10.1186/1471-2105-13-157.

引用本文的文献

DeepAIPs-SFLA: Deep Convolutional Model for Prediction of Anti-Inflammatory Peptides Using Binary Pattern Decomposition of Novel Multiview Descriptors with an SFLA Approach.深度人工智能粒子群优化算法：基于新型多视图描述符的二元模式分解与粒子群优化算法的深度卷积模型用于抗炎肽预测

ACS Omega. 2025 Aug 5;10(32):35747-35762. doi: 10.1021/acsomega.5c02422. eCollection 2025 Aug 19.

BGATT-GR: accurate identification of glucocorticoid receptor antagonists based on data augmentation combined with BiGRU-attention.BGATT-GR：基于数据增强结合双向门控循环单元-注意力机制的糖皮质激素受体拮抗剂准确识别

Sci Rep. 2025 Jul 1;15(1):21402. doi: 10.1038/s41598-025-05839-8.

An optimized transformer model for efficient detection of thoracic diseases in chest X-rays with multi-scale feature fusion.一种用于通过多尺度特征融合高效检测胸部X光片中胸部疾病的优化变压器模型。

PLoS One. 2025 May 7;20(5):e0323239. doi: 10.1371/journal.pone.0323239. eCollection 2025.

ACS Omega. 2025 Mar 18;10(12):12403-12416. doi: 10.1021/acsomega.4c11449. eCollection 2025 Apr 1.

本文引用的文献

Bridging chemical structure and conceptual knowledge enables accurate prediction of compound-protein interaction.桥接化学结构和概念知识可实现化合物-蛋白质相互作用的准确预测。

BMC Biol. 2024 Oct 29;22(1):248. doi: 10.1186/s12915-024-02049-y.

Prediction of human O-linked glycosylation sites using stacked generalization and embeddings from pre-trained protein language model.基于堆叠泛化和预训练蛋白质语言模型嵌入的人源 O 糖基化位点预测。

Bioinformatics. 2024 Nov 1;40(11). doi: 10.1093/bioinformatics/btae643.

A Foundation Model Identifies Broad-Spectrum Antimicrobial Peptides against Drug-Resistant Bacterial Infection.一种基础模型可识别针对耐药性细菌感染的广谱抗菌肽。

Nat Commun. 2024 Aug 30;15(1):7538. doi: 10.1038/s41467-024-51933-2.

Identification of microbe-disease signed associations via multi-scale variational graph autoencoder based on signed message propagation.基于有向消息传播的多尺度变分图自动编码器识别微生物-疾病签名关联。

BMC Biol. 2024 Aug 15;22(1):172. doi: 10.1186/s12915-024-01968-0.

HydrogelFinder: A Foundation Model for Efficient Self-Assembling Peptide Discovery Guided by Non-Peptidal Small Molecules.水凝胶查找器：一种基于非肽小分子指导的高效自组装肽发现的基础模型。

Adv Sci (Weinh). 2024 Jul;11(26):e2400829. doi: 10.1002/advs.202400829. Epub 2024 May 5.

Clathrin mediated endocytosis in Alzheimer's disease: cell type specific involvement in amyloid beta pathology.网格蛋白介导的内吞作用在阿尔茨海默病中的作用：细胞类型特异性参与淀粉样β病理过程。

Front Aging Neurosci. 2024 Apr 17;16:1378576. doi: 10.3389/fnagi.2024.1378576. eCollection 2024.

Improving protein-protein interaction prediction using protein language model and protein network features.利用蛋白质语言模型和蛋白质网络特征改进蛋白质-蛋白质相互作用预测。

Anal Biochem. 2024 Oct;693:115550. doi: 10.1016/j.ab.2024.115550. Epub 2024 Apr 26.

DPI_CDF: druggable protein identifier using cascade deep forest.DPI_CDF：基于级联深度森林的可成药性蛋白识别方法。

BMC Bioinformatics. 2024 Apr 5;25(1):145. doi: 10.1186/s12859-024-05744-3.

An ensemble computational model for prediction of clathrin protein by coupling machine learning with discrete cosine transform.一种通过将机器学习与离散余弦变换相结合来预测网格蛋白的集成计算模型。

J Biomol Struct Dyn. 2024 Mar 18:1-9. doi: 10.1080/07391102.2024.2329777.

SumoPred-PLM: human SUMOylation and SUMO2/3 sites Prediction using Pre-trained Protein Language Model.SumoPred-PLM：使用预训练蛋白质语言模型预测人类SUMO化和SUMO2/3位点

NAR Genom Bioinform. 2024 Feb 7;6(1):lqae011. doi: 10.1093/nargab/lqae011. eCollection 2024 Mar.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

TargetCLP：通过加权特征整合方法结合基于变换和进化尺度建模的多视图特征进行网格蛋白蛋白质预测。

TargetCLP: clathrin proteins prediction combining transformed and evolutionary scale modeling-based multi-view features via weighted feature integration approach.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献