Suppr超能文献

DeepDBS:利用深度表示和随机森林识别蛋白质序列中的 DNA 结合位点。

DeepDBS: Identification of DNA-binding sites in protein sequences by using deep representations and random forest.

机构信息

Department of Computer Science, School of Systems and Technology, University of Management and Technology, Lahore, Punjab 54770, Pakistan.

Department of Computer Engineering, College of Computer, Qassim University, Buraydah, Saudi Arabia.

出版信息

Methods. 2024 Nov;231:26-36. doi: 10.1016/j.ymeth.2024.09.004. Epub 2024 Sep 11.

Abstract

Interactions of biological molecules in organisms are considered to be primary factors for the lifecycle of that organism. Various important biological functions are dependent on such interactions and among different kinds of interactions, the protein DNA interactions are very important for the processes of transcription, regulation of gene expression, DNA repairing and packaging. Thus, keeping the knowledge of such interactions and the sites of those interactions is necessary to study the mechanism of various biological processes. As experimental identification through biological assays is quite resource-demanding, costly and error-prone, scientists opt for the computational methods for efficient and accurate identification of such DNA-protein interaction sites. Thus, herein, we propose a novel and accurate method namely DeepDBS for the identification of DNA-binding sites in proteins, using primary amino acid sequences of proteins under study. From protein sequences, deep representations were computed through a one-dimensional convolution neural network (1D-CNN), recurrent neural network (RNN) and long short-term memory (LSTM) network and were further used to train a Random Forest classifier. Random Forest with LSTM-based features outperformed the other models, as well as the existing state-of-the-art methods with an accuracy score of 0.99 for self-consistency test, 10-fold cross-validation, 5-fold cross-validation, and jackknife validation while 0.92 for independent dataset testing. It is concluded based on results that the DeepDBS can help accurate and efficient identification of DNA binding sites (DBS) in proteins.

摘要

生物分子在生物体内的相互作用被认为是该生物生命周期的主要因素。各种重要的生物功能都依赖于这种相互作用,而在不同类型的相互作用中,蛋白质与 DNA 的相互作用对于转录、基因表达调控、DNA 修复和包装等过程非常重要。因此,了解这些相互作用和相互作用的位置对于研究各种生物过程的机制是必要的。由于通过生物测定进行实验鉴定非常耗费资源、昂贵且容易出错,因此科学家们选择计算方法来高效、准确地识别这些 DNA-蛋白质相互作用位点。因此,在这里,我们提出了一种新的、准确的方法,即 DeepDBS,用于使用研究中的蛋白质的原始氨基酸序列来识别蛋白质中的 DNA 结合位点。从蛋白质序列中,通过一维卷积神经网络(1D-CNN)、递归神经网络(RNN)和长短期记忆(LSTM)网络计算出深度表示,并进一步用于训练随机森林分类器。基于 LSTM 特征的随机森林在自我一致性测试、10 折交叉验证、5 折交叉验证和刀切验证中的准确率为 0.99,独立数据集测试中的准确率为 0.92,优于其他模型和现有的最先进方法。结果表明,DeepDBS 可以帮助准确、高效地识别蛋白质中的 DNA 结合位点(DBS)。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验