Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand.
Applied Computational Chemistry Research Unit, Department of Chemistry, School of Science, King Mongkut's Institute of Technology Ladkrabang, Bangkok, 10520, Thailand.
J Comput Aided Mol Des. 2021 Oct;35(10):1037-1053. doi: 10.1007/s10822-021-00418-1. Epub 2021 Oct 8.
Fast and accurate identification of inhibitors with potency against HCV NS5B polymerase is currently a challenging task. As conventional experimental methods is the gold standard method for the design and development of new HCV inhibitors, they often require costly investment of time and resources. In this study, we develop a novel machine learning-based meta-predictor (termed StackHCV) for accurate and large-scale identification of HCV inhibitors. Unlike the existing method, which is based on single-feature-based approach, we first constructed a pool of various baseline models by employing a wide range of heterogeneous molecular fingerprints with five popular machine learning algorithms (k-nearest neighbor, multi-layer perceptron, partial least squares, random forest and support vectors machine). Secondly, we integrated these baseline models in order to develop the final meta-based model by means of the stacking strategy. Extensive benchmarking experiments showed that StackHCV achieved a more accurate and stable performance as compared to its constituent baseline models on the training dataset and also outperformed the existing predictor on the independent test dataset. To facilitate the high-throughput identification of HCV inhibitors, we built a web server that can be freely accessed at http://camt.pythonanywhere.com/StackHCV . It is expected that StackHCV could be a useful tool for fast and precise identification of potential drugs against HCV NS5B particularly for liver cancer therapy and other clinical applications.
快速准确地鉴定对 HCV NS5B 聚合酶有抑制作用的抑制剂目前是一项具有挑战性的任务。由于传统的实验方法是设计和开发新型 HCV 抑制剂的金标准方法,它们通常需要昂贵的时间和资源投入。在这项研究中,我们开发了一种新的基于机器学习的元预测器(称为 StackHCV),用于准确和大规模鉴定 HCV 抑制剂。与现有的基于单特征的方法不同,我们首先通过使用五种流行的机器学习算法(k-最近邻、多层感知器、偏最小二乘、随机森林和支持向量机)构建了一个由各种异构分子指纹组成的基础模型库。其次,我们通过堆叠策略整合这些基础模型,以开发最终的元模型。广泛的基准测试实验表明,与训练数据集上的组成基础模型相比,StackHCV 具有更准确和稳定的性能,并且在独立测试数据集上优于现有的预测器。为了方便高通量鉴定 HCV 抑制剂,我们构建了一个可以在 http://camt.pythonanywhere.com/StackHCV 上免费访问的网络服务器。预计 StackHCV 将成为快速准确鉴定 HCV NS5B 潜在药物的有用工具,特别是用于肝癌治疗和其他临床应用。