Gu Xingyue, Ding Yijie, Xiao Pengfeng, He Tao
State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, China.
Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang, China.
Front Genet. 2022 Nov 23;13:935717. doi: 10.3389/fgene.2022.935717. eCollection 2022.
There is a great deal of importance to SNARE proteins, and their absence from function can lead to a variety of diseases. The SNARE protein is known as a membrane fusion protein, and it is crucial for mediating vesicle fusion. The identification of SNARE proteins must therefore be conducted with an accurate method. Through extensive experiments, we have developed a model based on graph-regularized k-local hyperplane distance nearest neighbor model (GHKNN) binary classification. In this, the model uses the physicochemical property extraction method to extract protein sequence features and the SMOTE method to upsample protein sequence features. The combination achieves the most accurate performance for identifying all protein sequences. Finally, we compare the model based on GHKNN binary classification with other classifiers and measure them using four different metrics: SN, SP, ACC, and MCC. In experiments, the model performs significantly better than other classifiers.
SNARE蛋白非常重要,其功能缺失会导致多种疾病。SNARE蛋白是一种膜融合蛋白,对介导囊泡融合至关重要。因此,必须用准确的方法来鉴定SNARE蛋白。通过大量实验,我们开发了一种基于图正则化k局部超平面距离最近邻模型(GHKNN)二元分类的模型。在此模型中,使用物理化学性质提取方法来提取蛋白质序列特征,并使用SMOTE方法对蛋白质序列特征进行上采样。这种组合在识别所有蛋白质序列方面实现了最准确的性能。最后,我们将基于GHKNN二元分类的模型与其他分类器进行比较,并使用四种不同的指标进行衡量:灵敏度(SN)、特异度(SP)、准确率(ACC)和马修斯相关系数(MCC)。在实验中,该模型的表现明显优于其他分类器。