Suppr超能文献

使用机器学习对凝血酶抑制剂抑制常数进行计算机模拟预测

In silico Prediction of Inhibitory Constant of Thrombin Inhibitors Using Machine Learning.

作者信息

Zhao Junnan, Zhu Lu, Zhou Weineng, Yin Lingfeng, Wang Yuchen, Fan Yuanrong, Chen Yadong, Liu Haichun

机构信息

Laboratory of Molecular Design and Drug Discovery, School of Science, China; Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198 Jiangsu, China.

出版信息

Comb Chem High Throughput Screen. 2018;21(9):662-669. doi: 10.2174/1386207322666181220130232.

Abstract

BACKGROUND

Thrombin is the central protease of the vertebrate blood coagulation cascade, which is closely related to cardiovascular diseases. The inhibitory constant Ki is the most significant property of thrombin inhibitors.

METHOD

This study was carried out to predict Ki values of thrombin inhibitors based on a large data set by using machine learning methods. Taking advantage of finding non-intuitive regularities on high-dimensional datasets, machine learning can be used to build effective predictive models. A total of 6554 descriptors for each compound were collected and an efficient descriptor selection method was chosen to find the appropriate descriptors. Four different methods including multiple linear regression (MLR), K Nearest Neighbors (KNN), Gradient Boosting Regression Tree (GBRT) and Support Vector Machine (SVM) were implemented to build prediction models with these selected descriptors.

RESULTS

The SVM model was the best one among these methods with R2=0.84, MSE=0.55 for the training set and R2=0.83, MSE=0.56 for the test set. Several validation methods such as yrandomization test and applicability domain evaluation, were adopted to assess the robustness and generalization ability of the model. The final model shows excellent stability and predictive ability and can be employed for rapid estimation of the inhibitory constant, which is full of help for designing novel thrombin inhibitors.

摘要

背景

凝血酶是脊椎动物血液凝固级联反应的核心蛋白酶,与心血管疾病密切相关。抑制常数Ki是凝血酶抑制剂最重要的特性。

方法

本研究旨在通过机器学习方法,基于一个大数据集预测凝血酶抑制剂的Ki值。利用在高维数据集上发现非直观规律的优势,机器学习可用于构建有效的预测模型。为每个化合物收集了总共6554个描述符,并选择了一种有效的描述符选择方法来找到合适的描述符。采用包括多元线性回归(MLR)、K近邻(KNN)、梯度提升回归树(GBRT)和支持向量机(SVM)在内的四种不同方法,用这些选定的描述符构建预测模型。

结果

在这些方法中,SVM模型是最好的,训练集的R2 = 0.84,MSE = 0.55,测试集的R2 = 0.83,MSE = 0.56。采用了几种验证方法,如随机化测试和适用域评估,来评估模型的稳健性和泛化能力。最终模型显示出优异的稳定性和预测能力,可用于快速估计抑制常数,这对设计新型凝血酶抑制剂有很大帮助。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验