School of Internet of Things Engineering, Jiangnan University, Wuxi 214122, China.
Engineering Research Center of Internet of Things Applied Technology, Ministry of Education, Wuxi 214122, China.
Int J Mol Sci. 2019 Aug 26;20(17):4175. doi: 10.3390/ijms20174175.
DNA-binding proteins play an important role in cell metabolism. In biological laboratories, the detection methods of DNA-binding proteins includes yeast one-hybrid methods, bacterial singles and X-ray crystallography methods and others, but these methods involve a lot of labor, material and time. In recent years, many computation-based approachs have been proposed to detect DNA-binding proteins. In this paper, a machine learning-based method, which is called the Fuzzy Kernel Ridge Regression model based on Multi-View Sequence Features (FKRR-MVSF), is proposed to identifying DNA-binding proteins. First of all, multi-view sequence features are extracted from protein sequences. Next, a Multiple Kernel Learning (MKL) algorithm is employed to combine multiple features. Finally, a Fuzzy Kernel Ridge Regression (FKRR) model is built to detect DNA-binding proteins. Compared with other methods, our model achieves good results. Our method obtains an accuracy of 83.26% and 81.72% on two benchmark datasets (PDB1075 and compared with PDB186), respectively.
DNA 结合蛋白在细胞代谢中发挥着重要作用。在生物实验室中,DNA 结合蛋白的检测方法包括酵母单杂交方法、细菌单杂交方法和 X 射线晶体学方法等,但这些方法涉及大量的人力、物力和时间。近年来,已经提出了许多基于计算的方法来检测 DNA 结合蛋白。本文提出了一种基于多视图序列特征的模糊核岭回归模型(FKRR-MVSF)的机器学习方法来识别 DNA 结合蛋白。首先,从蛋白质序列中提取多视图序列特征。然后,采用多核学习(MKL)算法来组合多个特征。最后,构建模糊核岭回归(FKRR)模型来检测 DNA 结合蛋白。与其他方法相比,我们的模型取得了较好的结果。我们的方法在两个基准数据集(PDB1075 和 PDB186)上的准确率分别为 83.26%和 81.72%。