Suppr超能文献

SSMFN:一种用于甲基化位点预测的融合空间与序列的深度学习模型。

SSMFN: a fused spatial and sequential deep learning model for methylation site prediction.

作者信息

Lumbanraja Favorisen Rosyking, Mahesworo Bharuno, Cenggoro Tjeng Wawan, Sudigyo Digdo, Pardamean Bens

机构信息

Department of Computer Science, Faculty of Mathematics and Natural Science, University of Lampung, Bandar Lampung, Lampung, Indonesia.

Bioinformatics and Data Science Research Center, Bina Nusantara University, West Jakarta, Jakarta, Indonesia.

出版信息

PeerJ Comput Sci. 2021 Aug 26;7:e683. doi: 10.7717/peerj-cs.683. eCollection 2021.

Abstract

BACKGROUND

Conventional methods for post-translational modification site prediction such as spectrophotometry, Western blotting, and chromatin immune precipitation can be very expensive and time-consuming. Neural networks (NN) are one of the computational approaches that can predict effectively the post-translational modification site. We developed a neural network model, namely the Sequential and Spatial Methylation Fusion Network (SSMFN), to predict possible methylation sites on protein sequences.

METHOD

We designed our model to be able to extract spatial and sequential information from amino acid sequences. Convolutional neural networks (CNN) is applied to harness spatial information, while long short-term memory (LSTM) is applied for sequential data. The latent representation of the CNN and LSTM branch are then fused. Afterwards, we compared the performance of our proposed model to the state-of-the-art methylation site prediction models on the balanced and imbalanced dataset.

RESULTS

Our model appeared to be better in almost all measurement when trained on the balanced training dataset. On the imbalanced training dataset, all of the models gave better performance since they are trained on more data. In several metrics, our model also surpasses the PRMePred model, which requires a laborious effort for feature extraction and selection.

CONCLUSION

Our models achieved the best performance across different environments in almost all measurements. Also, our result suggests that the NN model trained on a balanced training dataset and tested on an imbalanced dataset will offer high specificity and low sensitivity. Thus, the NN model for methylation site prediction should be trained on an imbalanced dataset. Since in the actual application, there are far more negative samples than positive samples.

摘要

背景

传统的翻译后修饰位点预测方法,如分光光度法、蛋白质免疫印迹法和染色质免疫沉淀法,可能非常昂贵且耗时。神经网络(NN)是能够有效预测翻译后修饰位点的计算方法之一。我们开发了一种神经网络模型,即序列和空间甲基化融合网络(SSMFN),以预测蛋白质序列上可能的甲基化位点。

方法

我们将模型设计为能够从氨基酸序列中提取空间和序列信息。应用卷积神经网络(CNN)来利用空间信息,而长短期记忆网络(LSTM)则用于处理序列数据。然后将CNN和LSTM分支的潜在表示进行融合。之后,我们在平衡和不平衡数据集上,将我们提出的模型的性能与最先进的甲基化位点预测模型进行了比较。

结果

在平衡训练数据集上训练时,我们的模型在几乎所有指标上似乎都表现得更好。在不平衡训练数据集上,所有模型的性能都更好,因为它们是在更多数据上进行训练的。在几个指标上,我们的模型也超过了PRMePred模型,后者在特征提取和选择方面需要付出巨大努力。

结论

在几乎所有测量中,我们的模型在不同环境下都取得了最佳性能。此外,我们的结果表明,在平衡训练数据集上训练并在不平衡数据集上测试的NN模型将具有高特异性和低敏感性。因此,用于甲基化位点预测的NN模型应该在不平衡数据集上进行训练。因为在实际应用中,负样本比正样本多得多。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/192c/8409337/9bb91cd213c4/peerj-cs-07-683-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验