Jia Xuelian, Ciallella Heather L, Russo Daniel P, Zhao Linlin, James Morgan H, Zhu Hao
The Rutgers Center for Computational and Integrative Biology, Joint Health Sciences Center, Camden, New Jersey 08103, United States.
Department of Psychiatry, Robert Wood Johnson Medical School, Rutgers University and Rutgers Biomedical Health Sciences, Piscataway, New Jersey 08854, United States; Brain Health Institute, Rutgers University and Rutgers Biomedical and Health Sciences, Piscataway, New Jersey 08854, United States.
ACS Sustain Chem Eng. 2021 Mar 15;9(10):3909-3919. doi: 10.1021/acssuschemeng.0c09139. Epub 2021 Mar 4.
Compared to traditional experimental approaches, computational modeling is a promising strategy to efficiently prioritize new candidates with low cost. In this study, we developed a novel data mining and computational modeling workflow proven to be applicable by screening new analgesic opioids. To this end, a large opioid data set was used as the probe to automatically obtain bioassay data from the PubChem portal. There were 114 PubChem bioassays selected to build quantitative structure-activity relationship (QSAR) models based on the testing results across the probe compounds. The compounds tested in each bioassay were used to develop 12 models using the combination of three machine learning approaches and four types of chemical descriptors. The model performance was evaluated by the coefficient of determination ( ) obtained from 5-fold cross-validation. In total, 49 models developed for 14 bioassays were selected based on the criteria and were identified to be mainly associated with binding affinities to different opioid receptors. The models for these 14 bioassays were further used to fill data gaps in the probe opioids data set and to predict general drug compounds in the DrugBank data set. This study provides a universal modeling strategy that can take advantage of large public data sets for computer-aided drug design (CADD).
与传统实验方法相比,计算建模是一种很有前景的策略,能够以低成本高效地对新候选药物进行优先级排序。在本研究中,我们开发了一种新型数据挖掘和计算建模工作流程,经筛选新的镇痛阿片类药物验证其具有适用性。为此,使用一个大型阿片类药物数据集作为探针,从PubChem数据库自动获取生物测定数据。基于探针化合物的测试结果,选择了114个PubChem生物测定来建立定量构效关系(QSAR)模型。在每个生物测定中测试的化合物用于结合三种机器学习方法和四种化学描述符开发12个模型。通过五折交叉验证得到的决定系数( )评估模型性能。根据标准,总共为14个生物测定开发的49个模型被选中,并确定主要与对不同阿片受体的结合亲和力相关。这14个生物测定的模型进一步用于填补探针阿片类药物数据集中的数据空白,并预测DrugBank数据集中的一般药物化合物。本研究提供了一种通用建模策略,可利用大型公共数据集进行计算机辅助药物设计(CADD)。