Wang Binyou, Tan Xiaoqiu, Guo Jianmin, Xiao Ting, Jiao Yan, Zhao Junlin, Wu Jianming, Wang Yiwei
School of Pharmacy, Southwest Medical University, Luzhou 646000, China.
School of Basic Medical Science, Southwest Medical University, Luzhou 646000, China.
Pharmaceutics. 2022 Apr 26;14(5):943. doi: 10.3390/pharmaceutics14050943.
Drug-induced immune thrombocytopenia (DITP) often occurs in patients receiving many drug treatments simultaneously. However, clinicians usually fail to accurately distinguish which drugs can be plausible culprits. Despite significant advances in laboratory-based DITP testing, in vitro experimental assays have been expensive and, in certain cases, cannot provide a timely diagnosis to patients. To address these shortcomings, this paper proposes an efficient machine learning-based method for DITP toxicity prediction. A small dataset consisting of 225 molecules was constructed. The molecules were represented by six fingerprints, three descriptors, and their combinations. Seven classical machine learning-based models were examined to determine an optimal model. The results show that the RDMD + PubChem-k-NN model provides the best prediction performance among all the models, achieving an area under the curve of 76.9% and overall accuracy of 75.6% on the external validation set. The application domain (AD) analysis demonstrates the prediction reliability of the RDMD + PubChem-k-NN model. Five structural fragments related to the DITP toxicity are identified through information gain (IG) method along with fragment frequency analysis. Overall, as far as known, it is the first machine learning-based classification model for recognizing chemicals with DITP toxicity and can be used as an efficient tool in drug design and clinical therapy.
药物性免疫性血小板减少症(DITP)常发生于同时接受多种药物治疗的患者中。然而,临床医生通常难以准确辨别哪些药物可能是罪魁祸首。尽管基于实验室的DITP检测取得了显著进展,但体外实验分析成本高昂,且在某些情况下无法为患者提供及时诊断。为解决这些不足,本文提出了一种基于机器学习的高效DITP毒性预测方法。构建了一个由225个分子组成的小型数据集。这些分子由六种指纹、三种描述符及其组合来表示。研究了七种基于经典机器学习的模型以确定最优模型。结果表明,RDMD + PubChem-k-NN模型在所有模型中提供了最佳预测性能,在外部验证集上的曲线下面积达到76.9%,总体准确率为75.6%。应用领域(AD)分析证明了RDMD + PubChem-k-NN模型的预测可靠性。通过信息增益(IG)方法和片段频率分析,识别出了五个与DITP毒性相关的结构片段。总体而言,据我们所知,这是首个基于机器学习的用于识别具有DITP毒性化学物质的分类模型,可作为药物设计和临床治疗中的有效工具。