Kim Eunyoung, Nam Hojung
School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Buk-gu, Gwangju, 61005, Republic of Korea.
BMC Bioinformatics. 2017 May 31;18(Suppl 7):227. doi: 10.1186/s12859-017-1638-4.
Drug-induced liver injury (DILI) is a critical issue in drug development because DILI causes failures in clinical trials and the withdrawal of approved drugs from the market. There have been many attempts to predict the risk of DILI based on in vivo and in silico identification of hepatotoxic compounds. In the current study, we propose the in silico prediction model predicting DILI using weighted molecular fingerprints.
In this study, we used 881 bits of molecular fingerprint and used as features describing presence or absence of each substructure of compounds. Then, the Bayesian probability of each substructure was calculated and labeled (positive or negative for DILI), and a weighted fingerprint was determined from the ratio of DILI-positive to DILI-negative probability values. Using weighted fingerprint features, the prediction models were trained and evaluated with the Random Forest (RF) and Support Vector Machine (SVM) algorithms. The constructed models yielded accuracies of 73.8% and 72.6%, AUCs of 0.791 and 0.768 in cross-validation. In independent tests, models achieved accuracies of 60.1% and 61.1% for RF and SVM, respectively. The results validated that weighted features helped increase overall performance of prediction models. The constructed models were further applied to the prediction of natural compounds in herbs to identify DILI potential, and 13,996 unique herbal compounds were predicted as DILI-positive with the SVM model.
The prediction models with weighted features increased the performance compared to non-weighted models. Moreover, we predicted the DILI potential of herbs with the best performed model, and the prediction results suggest that many herbal compounds could have potential to be DILI. We can thus infer that taking natural products without detailed references about the relevant pathways may be dangerous. Considering the frequency of use of compounds in natural herbs and their increased application in drug development, DILI labeling would be very important.
药物性肝损伤(DILI)是药物研发中的一个关键问题,因为DILI会导致临床试验失败以及已批准药物退市。人们已经进行了许多尝试,基于体内和计算机模拟识别肝毒性化合物来预测DILI风险。在本研究中,我们提出了一种使用加权分子指纹预测DILI的计算机模拟预测模型。
在本研究中,我们使用了881位的分子指纹作为描述化合物各子结构存在与否的特征。然后,计算每个子结构的贝叶斯概率并进行标记(DILI为阳性或阴性),并根据DILI阳性与DILI阴性概率值的比率确定加权指纹。使用加权指纹特征,采用随机森林(RF)和支持向量机(SVM)算法对预测模型进行训练和评估。构建的模型在交叉验证中的准确率分别为73.8%和72.6%,曲线下面积(AUC)分别为0.791和0.768。在独立测试中,RF和SVM模型的准确率分别为60.1%和61.1%。结果证实加权特征有助于提高预测模型的整体性能。构建的模型进一步应用于草药中天然化合物的DILI潜力预测,SVM模型预测有13996种独特的草药化合物为DILI阳性。
与未加权模型相比,具有加权特征的预测模型性能有所提高。此外,我们使用性能最佳的模型预测了草药的DILI潜力,预测结果表明许多草药化合物可能具有DILI潜力。因此我们可以推断,在没有详细了解相关途径的情况下服用天然产物可能是危险的。考虑到天然草药中化合物的使用频率及其在药物研发中的应用增加,DILI标签将非常重要。