Atasever Sema
Department of Computer Engineering, Faculty of Engineering and Architecture, Nevsehir Haci Bektas Veli University, 50300 Nevşehir, Turkey.
Int J Mol Sci. 2025 Mar 17;26(6):2680. doi: 10.3390/ijms26062680.
The classification of Hepatitis C virus (HCV) NS3 inhibitors is essential for identifying potential antiviral agents through computational methods. This study aims to develop an optimized machine learning (ML) model using random forest (RF) and molecular fingerprints to accurately classify HCV NS3 inhibitors. A dataset of 965 molecules was retrieved from the ChEMBL database, and 290 bioactive compounds were selected for model training. Twelve molecular fingerprint descriptors were tested, and the CDK graph-only fingerprint yielded the best performance. In addition to RF, performance comparisons of other classifiers such as instance-based k-nearest neighbor (IBk), logistic regression (LR), AdaBoost, and OneR were conducted using WEKA with various molecular fingerprint descriptors. The optimized RF model achieved an accuracy of 89.6552%, a mean absolute error (MAE) of 0.2114, a root mean square error (RMSE) of 0.3304, and a Matthews correlation coefficient (MCC) of 0.7950 on the test set. These results highlight the effectiveness of optimized molecular fingerprints in enhancing virtual screening (VS) for HCV inhibitors. This approach offers a data-driven method for drug discovery.
丙型肝炎病毒(HCV)NS3抑制剂的分类对于通过计算方法识别潜在抗病毒药物至关重要。本研究旨在使用随机森林(RF)和分子指纹开发一种优化的机器学习(ML)模型,以准确分类HCV NS3抑制剂。从ChEMBL数据库中检索了一个包含965个分子的数据集,并选择了290种生物活性化合物用于模型训练。测试了12种分子指纹描述符,其中CDK仅图形指纹表现最佳。除了RF之外,还使用WEKA和各种分子指纹描述符对其他分类器(如基于实例的k近邻(IBk)、逻辑回归(LR)、AdaBoost和OneR)进行了性能比较。优化后的RF模型在测试集上的准确率为89.6552%,平均绝对误差(MAE)为0.2114,均方根误差(RMSE)为0.3304,马修斯相关系数(MCC)为0.7950。这些结果突出了优化的分子指纹在增强HCV抑制剂虚拟筛选(VS)方面的有效性。这种方法为药物发现提供了一种数据驱动的方法。