Suppr超能文献

一种基于iLearnPlus的SNARE蛋白识别方法,可有效解决数据不平衡问题。

A SNARE Protein Identification Method Based on iLearnPlus to Efficiently Solve the Data Imbalance Problem.

作者信息

Ma Dong, Chen Zhihua, He Zhanpeng, Huang Xueqin

机构信息

Institute of Computing Science and Technology, Guangzhou University, Guangdong, China.

出版信息

Front Genet. 2022 Jan 28;12:818841. doi: 10.3389/fgene.2021.818841. eCollection 2021.

Abstract

Machine learning has been widely used to solve complex problems in engineering applications and scientific fields, and many machine learning-based methods have achieved good results in different fields. SNAREs are key elements of membrane fusion and required for the fusion process of stable intermediates. They are also associated with the formation of some psychiatric disorders. This study processes the original sequence data with the synthetic minority oversampling technique (SMOTE) to solve the problem of data imbalance and produces the most suitable machine learning model with the iLearnPlus platform for the identification of SNARE proteins. Ultimately, a sensitivity of 66.67%, specificity of 93.63%, accuracy of 91.33%, and MCC of 0.528 were obtained in the cross-validation dataset, and a sensitivity of 66.67%, specificity of 93.63%, accuracy of 91.33%, and MCC of 0.528 were obtained in the independent dataset (the adaptive skip dipeptide composition descriptor was used for feature extraction, and LightGBM with proper parameters was used as the classifier). These results demonstrate that this combination can perform well in the classification of SNARE proteins and is superior to other methods.

摘要

机器学习已被广泛应用于解决工程应用和科学领域中的复杂问题,许多基于机器学习的方法在不同领域都取得了良好的效果。可溶性N-乙基马来酰亚胺敏感因子附着蛋白受体(SNAREs)是膜融合的关键元件,也是稳定中间体融合过程所必需的。它们还与某些精神疾病的形成有关。本研究使用合成少数过采样技术(SMOTE)处理原始序列数据,以解决数据不平衡问题,并使用iLearnPlus平台生成最适合用于识别SNARE蛋白的机器学习模型。最终,在交叉验证数据集中获得了66.67%的灵敏度、93.63%的特异性、91.33%的准确率和0.528的马修斯相关系数(MCC),在独立数据集中也获得了66.67%的灵敏度、93.63%的特异性、91.33%的准确率和0.528的MCC(使用自适应跳过二肽组成描述符进行特征提取,并使用具有适当参数的LightGBM作为分类器)。这些结果表明,这种组合在SNARE蛋白分类中表现良好,优于其他方法。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验