Department of Computer Science, University of Texas at Arlington, Arlington, Texas 76013, United States.
National Center for Advancing Translating Sciences, National Institutes of Health, Rockville, Maryland 20850, United States.
Chem Res Toxicol. 2021 Feb 15;34(2):495-506. doi: 10.1021/acs.chemrestox.0c00322. Epub 2020 Dec 21.
Drug-induced liver injury (DILI) is a crucial factor in determining the qualification of potential drugs. However, the DILI property is excessively difficult to obtain due to the complex testing process. Consequently, an screening in the early stage of drug discovery would help to reduce the total development cost by filtering those drug candidates with a high risk to cause DILI. To serve the screening goal, we apply several computational techniques to predict the DILI property, including traditional machine learning methods and graph-based deep learning techniques. While deep learning models require large training data to tune huge model parameters, the DILI data set only contains a few hundred annotated molecules. To alleviate the data scarcity problem, we propose a property augmentation strategy to include massive training data with other property information. Extensive experiments demonstrate that our proposed method significantly outperforms all existing baselines on the DILI data set by obtaining a 81.4% accuracy using cross-validation with random splitting, 78.7% using leave-one-out cross-validation, and 76.5% using cross-validation with scaffold splitting.
药物性肝损伤(DILI)是决定潜在药物资格的关键因素。然而,由于检测过程复杂,DILI 特性极难获得。因此,在药物发现的早期进行筛选有助于通过筛选出那些具有高 DILI 风险的药物候选物来降低总开发成本。为了实现筛选目标,我们应用了几种计算技术来预测 DILI 特性,包括传统的机器学习方法和基于图的深度学习技术。虽然深度学习模型需要大量的训练数据来调整庞大的模型参数,但 DILI 数据集仅包含几百个已注释的分子。为了缓解数据匮乏的问题,我们提出了一种属性增强策略,将大量带有其他属性信息的训练数据纳入其中。广泛的实验表明,我们提出的方法在 DILI 数据集上显著优于所有现有的基线,在随机分割的交叉验证中准确率达到 81.4%,在留一法交叉验证中准确率达到 78.7%,在支架分割的交叉验证中准确率达到 76.5%。