Suppr超能文献

通过机器学习和蛋白质语言模型嵌入进行离子通道分类

Ion channel classification through machine learning and protein language model embeddings.

作者信息

Ghazikhani Hamed, Butler Gregory

机构信息

Department of Computer Science and Software Engineering, Concordia University, Montreal, Canada.

出版信息

J Integr Bioinform. 2024 Nov 25;21(4). doi: 10.1515/jib-2023-0047. eCollection 2024 Dec 1.

Abstract

Ion channels are critical membrane proteins that regulate ion flux across cellular membranes, influencing numerous biological functions. The resource-intensive nature of traditional wet lab experiments for ion channel identification has led to an increasing emphasis on computational techniques. This study extends our previous work on protein language models for ion channel prediction, significantly advancing the methodology and performance. We employ a comprehensive array of machine learning algorithms, including k-Nearest Neighbors, Random Forest, Support Vector Machines, and Feed-Forward Neural Networks, alongside a novel Convolutional Neural Network (CNN) approach. These methods leverage fine-tuned embeddings from ProtBERT, ProtBERT-BFD, and MembraneBERT to differentiate ion channels from non-ion channels. Our empirical findings demonstrate that TooT-BERT-CNN-C, which combines features from ProtBERT-BFD and a CNN, substantially surpasses existing benchmarks. On our original dataset, it achieves a Matthews Correlation Coefficient (MCC) of 0.8584 and an accuracy of 98.35 %. More impressively, on a newly curated, larger dataset (DS-Cv2), it attains an MCC of 0.9492 and an ROC AUC of 0.9968 on the independent test set. These results not only highlight the power of integrating protein language models with deep learning for ion channel classification but also underscore the importance of using up-to-date, comprehensive datasets in bioinformatics tasks. Our approach represents a significant advancement in computational methods for ion channel identification, with potential implications for accelerating research in ion channel biology and aiding drug discovery efforts.

摘要

离子通道是关键的膜蛋白,可调节离子跨细胞膜的通量,影响众多生物学功能。传统湿实验室实验用于离子通道鉴定的资源密集型性质导致人们越来越重视计算技术。本研究扩展了我们之前关于用于离子通道预测的蛋白质语言模型的工作,显著推进了方法和性能。我们采用了一系列综合的机器学习算法,包括k近邻、随机森林、支持向量机和前馈神经网络,以及一种新颖的卷积神经网络(CNN)方法。这些方法利用来自ProtBERT、ProtBERT-BFD和MembraneBERT的微调嵌入来区分离子通道和非离子通道。我们的实证结果表明,结合了ProtBERT-BFD和CNN特征的TooT-BERT-CNN-C大大超越了现有基准。在我们的原始数据集上,它实现了0.8584的马修斯相关系数(MCC)和98.35%的准确率。更令人印象深刻的是,在一个新策划的更大数据集(DS-Cv2)上,它在独立测试集上达到了0.9492的MCC和0.9968的ROC AUC。这些结果不仅突出了将蛋白质语言模型与深度学习相结合用于离子通道分类的强大功能,还强调了在生物信息学任务中使用最新、全面数据集的重要性。我们的方法代表了离子通道鉴定计算方法的重大进展,对加速离子通道生物学研究和辅助药物发现工作具有潜在意义。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2669/11698620/7a47978ebeee/j_jib-2023-0047_fig_001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验