Ullah Matee, Akbar Shahid, Raza Ali, Khan Kashif Ahmad, Zou Quan
Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, Sichuan 610054, China.
Department of Computer Science, Abdul Wali Khan University Mardan, Mardan 23200, Pakistan.
Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbaf026.
Clathrin proteins, key elements of the vesicle coat, play a crucial role in various cellular processes, including neural function, signal transduction, and endocytosis. Disruptions in clathrin protein functions have been associated with a wide range of diseases, such as Alzheimer's, neurodegeneration, viral infection, and cancer. Therefore, correctly identifying clathrin protein functions is critical to unravel the mechanism of these fatal diseases and designing drug targets. This paper presents a novel computational method, named TargetCLP, to precisely identify clathrin proteins. TargetCLP leverages four single-view feature representation methods, including two transformed feature sets (PSSM-CLBP and RECM-CLBP), one qualitative characteristics feature, and one deep-learned-based embedding using ESM. The single-view features are integrated based on their weights using differential evolution, and the BTG feature selection algorithm is utilized to generate a more optimal and reduced subset. The model is trained using various classifiers, among which the proposed SnBiLSTM achieved remarkable performance. Experimental and comparative results on both training and independent datasets show that the proposed TargetCLP offers significant improvements in terms of both prediction accuracy and generalization to unseen data, furthering advancements in the research field.
网格蛋白是囊泡衣被的关键成分,在包括神经功能、信号转导和内吞作用在内的各种细胞过程中发挥着至关重要的作用。网格蛋白功能的破坏与多种疾病有关,如阿尔茨海默病、神经退行性变、病毒感染和癌症。因此,正确识别网格蛋白的功能对于揭示这些致命疾病的发病机制和设计药物靶点至关重要。本文提出了一种名为TargetCLP的新型计算方法,用于精确识别网格蛋白。TargetCLP利用了四种单视图特征表示方法,包括两个变换后的特征集(PSSM-CLBP和RECM-CLBP)、一个定性特征和一个基于深度学习的使用ESM的嵌入。单视图特征基于其权重使用差分进化进行整合,并利用BTG特征选择算法生成一个更优且精简的子集。该模型使用各种分类器进行训练,其中所提出的SnBiLSTM取得了显著的性能。在训练数据集和独立数据集上的实验及对比结果表明,所提出的TargetCLP在预测准确性和对未见数据的泛化能力方面都有显著提高,推动了该研究领域的进展。