Suppr超能文献

TB-IECS:一种用于虚拟筛选的基于机器学习的精确评分函数。

TB-IECS: an accurate machine learning-based scoring function for virtual screening.

作者信息

Zhang Xujun, Shen Chao, Jiang Dejun, Zhang Jintu, Ye Qing, Xu Lei, Hou Tingjun, Pan Peichen, Kang Yu

机构信息

Innovation Institute for Artificial Intelligence in Medicine of, Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China.

Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou, 213001, China.

出版信息

J Cheminform. 2023 Jul 4;15(1):63. doi: 10.1186/s13321-023-00731-x.

Abstract

Machine learning-based scoring functions (MLSFs) have shown potential for improving virtual screening capabilities over classical scoring functions (SFs). Due to the high computational cost in the process of feature generation, the numbers of descriptors used in MLSFs and the characterization of protein-ligand interactions are always limited, which may affect the overall accuracy and efficiency. Here, we propose a new SF called TB-IECS (theory-based interaction energy component score), which combines energy terms from Smina and NNScore version 2, and utilizes the eXtreme Gradient Boosting (XGBoost) algorithm for model training. In this study, the energy terms decomposed from 15 traditional SFs were firstly categorized based on their formulas and physicochemical principles, and 324 feature combinations were generated accordingly. Five best feature combinations were selected for further evaluation of the model performance in regard to the selection of feature vectors with various length, interaction types and ML algorithms. The virtual screening power of TB-IECS was assessed on the datasets of DUD-E and LIT-PCBA, as well as seven target-specific datasets from the ChemDiv database. The results showed that TB-IECS outperformed classical SFs including Glide SP and Dock, and effectively balanced the efficiency and accuracy for practical virtual screening.

摘要

基于机器学习的评分函数(MLSFs)已显示出比传统评分函数(SFs)更具提升虚拟筛选能力的潜力。由于特征生成过程中的计算成本高昂,MLSFs中使用的描述符数量以及蛋白质-配体相互作用的表征总是有限的,这可能会影响整体准确性和效率。在此,我们提出一种名为TB-IECS(基于理论的相互作用能量成分评分)的新评分函数,它结合了Smina和NNScore版本2的能量项,并利用极端梯度提升(XGBoost)算法进行模型训练。在本研究中,首先根据15种传统SFs的公式和物理化学原理对分解出的能量项进行分类,并据此生成324种特征组合。针对不同长度的特征向量、相互作用类型和机器学习算法的选择,挑选出五种最佳特征组合以进一步评估模型性能。在DUD-E和LIT-PCBA数据集以及来自ChemDiv数据库的七个特定靶点数据集上评估了TB-IECS的虚拟筛选能力。结果表明,TB-IECS优于包括Glide SP和Dock在内的传统SFs,并有效平衡了实际虚拟筛选的效率和准确性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0017/10320911/6df0498cfdbc/13321_2023_731_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验