基于终身学习的 DNA 蛋白质结合识别。

DNA protein binding recognition based on lifelong learning.

机构信息

School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, 215009, China.

Gusu School, Nanjing Medical University, Suzhou, Jiangsu, China.

出版信息

Comput Biol Med. 2023 Sep;164:107094. doi: 10.1016/j.compbiomed.2023.107094. Epub 2023 Jun 16.

DOI:10.1016/j.compbiomed.2023.107094

PMID:37459792

Abstract

In recent years, research in the field of bioinformatics has focused on predicting the raw sequences of proteins, and some scholars consider DNA-binding protein prediction as a classification task. Many statistical and machine learning-based methods have been widely used in DNA-binding proteins research. The aforementioned methods are indeed more efficient than those based on manual classification, but there is still room for improvement in terms of prediction accuracy and speed. In this study, researchers used Average Blocks, Discrete Cosine Transform, Discrete Wavelet Transform, Global encoding, Normalized Moreau-Broto Autocorrelation and Pseudo position-specific scoring matrix to extract evolutionary features. A dynamic deep network based on lifelong learning architecture was then proposed in order to fuse six features and thus allow for more efficient classification of DNA-binding proteins. The multi-feature fusion allows for a more accurate description of the desired protein information than single features. This model offers a fresh perspective on the dichotomous classification problem in bioinformatics and broadens the application field of lifelong learning. The researchers ran trials on three datasets and contrasted them with other classification techniques to show the model's effectiveness in this study. The findings demonstrated that the model used in this research was superior to other approaches in terms of single-sample specificity (81.0%, 83.0%) and single-sample sensitivity (82.4%, 90.7%), and achieves high accuracy on the benchmark dataset (88.4%, 80.0%, and 76.6%).

摘要

近年来，生物信息学领域的研究集中在预测蛋白质的原始序列上，一些学者将 DNA 结合蛋白预测视为分类任务。许多基于统计和机器学习的方法已被广泛应用于 DNA 结合蛋白的研究中。上述方法确实比基于手动分类的方法更高效，但在预测准确性和速度方面仍有改进的空间。在这项研究中，研究人员使用平均块、离散余弦变换、离散小波变换、全局编码、归一化 Moreau-Broto 自相关和伪位置特异性评分矩阵来提取进化特征。然后提出了一种基于终身学习架构的动态深度网络，以融合六个特征，从而更有效地对 DNA 结合蛋白进行分类。多特征融合比单一特征更能准确地描述所需的蛋白质信息。该模型为生物信息学中的二分分类问题提供了新的视角，并拓宽了终身学习的应用领域。研究人员在三个数据集上进行了试验，并与其他分类技术进行了对比，以展示该模型在本研究中的有效性。研究结果表明，与其他方法相比，该研究中使用的模型在单一样本特异性（81.0%、83.0%）和单一样本敏感性（82.4%、90.7%）方面表现更好，在基准数据集上也实现了高准确性（88.4%、80.0%和 76.6%）。

相似文献

DNA protein binding recognition based on lifelong learning.基于终身学习的 DNA 蛋白质结合识别。

Comput Biol Med. 2023 Sep;164:107094. doi: 10.1016/j.compbiomed.2023.107094. Epub 2023 Jun 16.

DP-BINDER: machine learning model for prediction of DNA-binding proteins by fusing evolutionary and physicochemical information.DP-BINDER：一种通过融合进化和物理化学信息来预测 DNA 结合蛋白的机器学习模型。

J Comput Aided Mol Des. 2019 Jul;33(7):645-658. doi: 10.1007/s10822-019-00207-x. Epub 2019 May 23.

Improved detection of DNA-binding proteins via compression technology on PSSM information.通过基于位置特异性得分矩阵（PSSM）信息的压缩技术改进DNA结合蛋白的检测。

PLoS One. 2017 Sep 29;12(9):e0185587. doi: 10.1371/journal.pone.0185587. eCollection 2017.

Target-DBPPred: An intelligent model for prediction of DNA-binding proteins using discrete wavelet transform based compression and light eXtreme gradient boosting.目标-DBPPred：一种使用基于离散小波变换的压缩和轻极限梯度提升的智能 DNA 结合蛋白预测模型。

Comput Biol Med. 2022 Jun;145:105533. doi: 10.1016/j.compbiomed.2022.105533. Epub 2022 Apr 16.

Identifying Membrane Protein Types Based on Lifelong Learning With Dynamically Scalable Networks.基于动态可扩展网络的终身学习识别膜蛋白类型

Front Genet. 2022 Mar 14;12:834488. doi: 10.3389/fgene.2021.834488. eCollection 2021.

CrystalM: A Multi-View Fusion Approach for Protein Crystallization Prediction.CrystalM：一种用于蛋白质结晶预测的多视图融合方法。

IEEE/ACM Trans Comput Biol Bioinform. 2021 Jan-Feb;18(1):325-335. doi: 10.1109/TCBB.2019.2912173. Epub 2021 Feb 3.

TargetDBP: Accurate DNA-Binding Protein Prediction Via Sequence-Based Multi-View Feature Learning.目标 DBP：基于序列的多视图特征学习的准确 DNA 结合蛋白预测。

IEEE/ACM Trans Comput Biol Bioinform. 2020 Jul-Aug;17(4):1419-1429. doi: 10.1109/TCBB.2019.2893634. Epub 2019 Jan 18.

Machine learning in computational docking.计算对接中的机器学习。

Artif Intell Med. 2015 Mar;63(3):135-52. doi: 10.1016/j.artmed.2015.02.002. Epub 2015 Feb 16.

circRNA-binding protein site prediction based on multi-view deep learning, subspace learning and multi-view classifier.基于多视图深度学习、子空间学习和多视图分类器的 circRNA 结合蛋白位点预测。

Brief Bioinform. 2022 Jan 17;23(1). doi: 10.1093/bib/bbab394.

PDRLGB: precise DNA-binding residue prediction using a light gradient boosting machine.PDRLGB：使用轻量级梯度提升机进行精确的 DNA 结合残基预测。

BMC Bioinformatics. 2018 Dec 31;19(Suppl 19):522. doi: 10.1186/s12859-018-2527-1.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验