Suppr超能文献

通过极端梯度提升算法识别DNA结合蛋白。

Identify DNA-Binding Proteins Through the Extreme Gradient Boosting Algorithm.

作者信息

Zhao Ziye, Yang Wen, Zhai Yixiao, Liang Yingjian, Zhao Yuming

机构信息

College of Information and Computer Engineering, Northeast Forestry University, Harbin, China.

International Medical Center, Shenzhen University General Hospital, Shenzhen, China.

出版信息

Front Genet. 2022 Jan 28;12:821996. doi: 10.3389/fgene.2021.821996. eCollection 2021.

Abstract

The exploration of DNA-binding proteins (DBPs) is an important aspect of studying biological life activities. Research on life activities requires the support of scientific research results on DBPs. The decline in many life activities is closely related to DBPs. Generally, the detection method for identifying DBPs is achieved through biochemical experiments. This method is inefficient and requires considerable manpower, material resources and time. At present, several computational approaches have been developed to detect DBPs, among which machine learning (ML) algorithm-based computational techniques have shown excellent performance. In our experiments, our method uses fewer features and simpler recognition methods than other methods and simultaneously obtains satisfactory results. First, we use six feature extraction methods to extract sequence features from the same group of DBPs. Then, this feature information is spliced together, and the data are standardized. Finally, the extreme gradient boosting (XGBoost) model is used to construct an effective predictive model. Compared with other excellent methods, our proposed method has achieved better results. The accuracy achieved by our method is 78.26% for PDB2272 and 85.48% for PDB186. The accuracy of the experimental results achieved by our strategy is similar to that of previous detection methods.

摘要

DNA结合蛋白(DBP)的探索是研究生理生命活动的一个重要方面。对生命活动的研究需要DBP的科研成果支持。许多生命活动的衰退都与DBP密切相关。一般来说,识别DBP的检测方法是通过生化实验来实现的。这种方法效率低下,需要大量的人力、物力和时间。目前,已经开发了几种计算方法来检测DBP,其中基于机器学习(ML)算法的计算技术表现出了优异的性能。在我们的实验中,我们的方法比其他方法使用更少的特征和更简单的识别方法,同时获得了令人满意的结果。首先,我们使用六种特征提取方法从同一组DBP中提取序列特征。然后,将这些特征信息拼接在一起,并对数据进行标准化。最后,使用极端梯度提升(XGBoost)模型构建有效的预测模型。与其他优秀方法相比,我们提出的方法取得了更好的结果。我们的方法对PDB2272的准确率为78.26%,对PDB186的准确率为85.48%。我们策略所取得的实验结果的准确率与之前的检测方法相似。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5407/8837382/042ce7c61092/fgene-12-821996-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验