Suppr超能文献

深度序列到蛋白预测(Deep-STP):一种基于深度学习的方法,通过词嵌入来预测蛇毒蛋白。

Deep-STP: a deep learning-based approach to predict snake toxin proteins by using word embeddings.

作者信息

Zulfiqar Hasan, Guo Zhiling, Ahmad Ramala Masood, Ahmed Zahoor, Cai Peiling, Chen Xiang, Zhang Yang, Lin Hao, Shi Zheng

机构信息

Yangtze Delta Region Institute (Huzhou), University of Electronic Science and Technology of China, Huzhou, Zhejiang, China.

Beidahuang Industry Group General Hospital, Harbin, China.

出版信息

Front Med (Lausanne). 2024 Jan 17;10:1291352. doi: 10.3389/fmed.2023.1291352. eCollection 2023.

Abstract

Snake venom contains many toxic proteins that can destroy the circulatory system or nervous system of prey. Studies have found that these snake venom proteins have the potential to treat cardiovascular and nervous system diseases. Therefore, the study of snake venom protein is conducive to the development of related drugs. The research technologies based on traditional biochemistry can accurately identify these proteins, but the experimental cost is high and the time is long. Artificial intelligence technology provides a new means and strategy for large-scale screening of snake venom proteins from the perspective of computing. In this paper, we developed a sequence-based computational method to recognize snake toxin proteins. Specially, we utilized three different feature descriptors, namely , natural vector and word 2 vector, to encode snake toxin protein sequences. The analysis of variance (ANOVA), gradient-boost decision tree algorithm (GBDT) combined with incremental feature selection (IFS) were used to optimize the features, and then the optimized features were input into the deep learning model for model training. The results show that our model can achieve a prediction performance with an accuracy of 82.00% in 10-fold cross-validation. The model is further verified on independent data, and the accuracy rate reaches to 81.14%, which demonstrated that our model has excellent prediction performance and robustness.

摘要

蛇毒含有许多有毒蛋白质,这些蛋白质能破坏猎物的循环系统或神经系统。研究发现,这些蛇毒蛋白具有治疗心血管和神经系统疾病的潜力。因此,对蛇毒蛋白的研究有利于相关药物的开发。基于传统生物化学的研究技术能够准确鉴定这些蛋白质,但实验成本高、耗时长。人工智能技术从计算角度为大规模筛选蛇毒蛋白提供了新的手段和策略。在本文中,我们开发了一种基于序列的计算方法来识别蛇毒素蛋白。具体而言,我们利用三种不同的特征描述符,即自然向量和词向量2,对蛇毒素蛋白序列进行编码。使用方差分析(ANOVA)、梯度提升决策树算法(GBDT)结合增量特征选择(IFS)对特征进行优化,然后将优化后的特征输入深度学习模型进行模型训练。结果表明,我们的模型在10折交叉验证中能够实现准确率为82.00%的预测性能。该模型在独立数据上进一步验证,准确率达到81.14%,这表明我们的模型具有优异的预测性能和鲁棒性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a94d/10829051/6c3b7c19370a/fmed-10-1291352-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验