Wilm Anke, Garcia de Lomana Marina, Stork Conrad, Mathai Neann, Hirte Steffen, Norinder Ulf, Kühnl Jochen, Kirchmair Johannes
Center for Bioinformatics (ZBH), Department of Informatics, Universität Hamburg, 20146 Hamburg, Germany.
HITeC e.V., 22527 Hamburg, Germany.
Pharmaceuticals (Basel). 2021 Aug 11;14(8):790. doi: 10.3390/ph14080790.
In recent years, a number of machine learning models for the prediction of the skin sensitization potential of small organic molecules have been reported and become available. These models generally perform well within their applicability domains but, as a result of the use of molecular fingerprints and other non-intuitive descriptors, the interpretability of the existing models is limited. The aim of this work is to develop a strategy to replace the non-intuitive features by predicted outcomes of bioassays. We show that such replacement is indeed possible and that as few as ten interpretable, predicted bioactivities are sufficient to reach competitive performance. On a holdout data set of 257 compounds, the best model ("Skin Doctor CP:Bio") obtained an efficiency of 0.82 and an MCC of 0.52 (at the significance level of 0.20). Skin Doctor CP:Bio is available free of charge for academic research. The modeling strategies explored in this work are easily transferable and could be adopted for the development of more interpretable machine learning models for the prediction of the bioactivity and toxicity of small organic compounds.
近年来,已有一些用于预测小分子有机化合物皮肤致敏潜力的机器学习模型被报道并可用。这些模型在其适用范围内通常表现良好,但由于使用了分子指纹和其他非直观描述符,现有模型的可解释性有限。这项工作的目的是开发一种策略,用生物测定的预测结果取代非直观特征。我们表明这种取代确实可行,并且仅十个可解释的预测生物活性就足以达到有竞争力的性能。在一个包含257种化合物的验证数据集上,最佳模型(“皮肤医生CP:生物”)的效率为0.82,马修斯相关系数为0.52(在显著性水平为0.20时)。皮肤医生CP:生物可供学术研究免费使用。这项工作中探索的建模策略易于转移,可用于开发更具可解释性的机器学习模型,以预测小分子有机化合物的生物活性和毒性。