College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China; Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao 266061, China; Key Laboratory of Synthetic Biology, CAS Center for Excellence in Molecular Plant Sciences, Institute of Plant Physiology and Ecology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200032, China.
College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China; Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao 266061, China.
Artif Intell Med. 2019 Jul;98:35-47. doi: 10.1016/j.artmed.2019.07.005. Epub 2019 Jul 19.
Discovering and accurately locating drug targets is of great significance for the research and development of new drugs. As a different approach to traditional drug development, the machine learning algorithm is used to predict the drug target by mining the data. Because of its advantages of short time and low cost, it has received more and more attention in recent years. In this paper, we propose a novel method for predicting druggable proteins. Firstly, the features of the protein sequence are extracted by combining Chou's pseudo amino acid composition (PseAAC), dipeptide composition (DPC) and reduced sequence (RS), getting the 591 dimension of drug target dataset. Then, the feature information of druggable proteins dataset is selected by genetic algorithm (GA). Finally, we use Bagging ensemble learning to improve SVM classifier to get the final prediction model. The predictive accuracy rate reaches 93.78% by using 5-fold cross-validation and compared with other state-of-the-art predictive methods. The results indicate that the method proposed in this paper has a high reference value for the prediction of potential drug targets, which will successfully play a key role in the drug research and development. The source code and all datasets are available at https://github.com/QUST-AIBBDRC/GA-Bagging-SVM.
发现和准确定位药物靶点对于新药的研究和开发具有重要意义。作为一种有别于传统药物开发的方法,机器学习算法通过挖掘数据来预测药物靶点。由于其时间短、成本低的优势,近年来受到了越来越多的关注。本文提出了一种新的可药理性蛋白质预测方法。首先,通过结合 Chou 的伪氨基酸组成(PseAAC)、二肽组成(DPC)和简化序列(RS)来提取蛋白质序列的特征,得到 591 维的药物靶点数据集。然后,通过遗传算法(GA)选择可药理性蛋白质数据集的特征信息。最后,我们使用 Bagging 集成学习来改进 SVM 分类器,得到最终的预测模型。通过 5 折交叉验证,预测准确率达到 93.78%,与其他最先进的预测方法相比有所提高。结果表明,本文提出的方法对潜在药物靶点的预测具有较高的参考价值,将在药物研究和开发中成功发挥关键作用。源代码和所有数据集均可在 https://github.com/QUST-AIBBDRC/GA-Bagging-SVM 上获得。