Suppr超能文献

基于随机森林方法的家族特异性蛋白质-配体复合物亲和力预测的比较研究。

A comparative study of family-specific protein-ligand complex affinity prediction based on random forest approach.

作者信息

Wang Yu, Guo Yanzhi, Kuang Qifan, Pu Xuemei, Ji Yue, Zhang Zhihang, Li Menglong

机构信息

College of Chemistry, Sichuan University, Chengdu, 610064, Sichuan, People's Republic of China.

出版信息

J Comput Aided Mol Des. 2015 Apr;29(4):349-60. doi: 10.1007/s10822-014-9827-y. Epub 2014 Dec 20.

Abstract

The assessment of binding affinity between ligands and the target proteins plays an essential role in drug discovery and design process. As an alternative to widely used scoring approaches, machine learning methods have also been proposed for fast prediction of the binding affinity with promising results, but most of them were developed as all-purpose models despite of the specific functions of different protein families, since proteins from different function families always have different structures and physicochemical features. In this study, we proposed a random forest method to predict the protein-ligand binding affinity based on a comprehensive feature set covering protein sequence, binding pocket, ligand structure and intermolecular interaction. Feature processing and compression was respectively implemented for different protein family datasets, which indicates that different features contribute to different models, so individual representation for each protein family is necessary. Three family-specific models were constructed for three important protein target families of HIV-1 protease, trypsin and carbonic anhydrase respectively. As a comparison, two generic models including diverse protein families were also built. The evaluation results show that models on family-specific datasets have the superior performance to those on the generic datasets and the Pearson and Spearman correlation coefficients (R p and Rs) on the test sets are 0.740, 0.874, 0.735 and 0.697, 0.853, 0.723 for HIV-1 protease, trypsin and carbonic anhydrase respectively. Comparisons with the other methods further demonstrate that individual representation and model construction for each protein family is a more reasonable way in predicting the affinity of one particular protein family.

摘要

评估配体与靶蛋白之间的结合亲和力在药物发现和设计过程中起着至关重要的作用。作为广泛使用的评分方法的替代方法,机器学习方法也已被提出用于快速预测结合亲和力并取得了有前景的结果,但其中大多数是作为通用模型开发的,而忽略了不同蛋白质家族的特定功能,因为来自不同功能家族的蛋白质总是具有不同的结构和物理化学特征。在本研究中,我们提出了一种基于涵盖蛋白质序列、结合口袋、配体结构和分子间相互作用的综合特征集来预测蛋白质-配体结合亲和力的随机森林方法。对不同蛋白质家族数据集分别进行了特征处理和压缩,这表明不同特征对不同模型有贡献,因此每个蛋白质家族需要单独表示。分别为HIV-1蛋白酶、胰蛋白酶和碳酸酐酶这三个重要的蛋白质靶标家族构建了三个家族特异性模型。作为比较,还构建了两个包含不同蛋白质家族的通用模型。评估结果表明,家族特异性数据集上的模型性能优于通用数据集上的模型,HIV-1蛋白酶、胰蛋白酶和碳酸酐酶测试集上的皮尔逊和斯皮尔曼相关系数(Rp和Rs)分别为0.740、0.874、0.735和0.697、0.853、0.723。与其他方法的比较进一步表明,为每个蛋白质家族单独表示和构建模型是预测特定蛋白质家族亲和力的更合理方法。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验