Makarov Dmitriy M, Ksenofontov Alexander A, Budkov Yury A
G. A. Krestov Institute of Solution Chemistry of the Russian Academy of Sciences, Ivanovo 153045, Russia.
Laboratory of Computational Physics, HSE University, Tallinskaya st. 34, Moscow 123458, Russia.
Chem Res Toxicol. 2025 Mar 17;38(3):392-399. doi: 10.1021/acs.chemrestox.4c00421. Epub 2025 Feb 19.
The utilization of predictive methodologies for the assessment of toxicological properties represents an alternative approach that facilitates the identification of safe compounds while concurrently reducing the financial costs associated with the process. The objective of the Tox24 Challenge was to assess the progress in computational methods for predicting the activity of chemical binding to transthyretin (TTR). In order to fulfill the requirements of this task, the data set, measured by the Environmental Protection Agency, consisted of 1512 chemical substances of diverse nature. This paper describes the model that won the Tox24 Challenge and the steps taken for its further improvement. The Transformer convolutional neural network (CNN) model achieved the best performance as a standalone solution. Meanwhile, a multitask model built on a graph CNN, trained using 11 additional acute systemic toxicity data sets with increased weighting on the TTR binding activity, showed comparable results on the blind test set. The winning solution was a consensus model consisting of two catBoost models with OEstate and Mold2 descriptor sets, as well as two transformer-based models. The improvement of this solution involved adding a fifth model based on multitask learning using the graph CNN method, which led to a reduction in RMSE on the blind test set to 20.3%. The winning model was developed using the OCHEM web platform and is available online at https://ochem.eu/article/162082.
利用预测方法评估毒理学特性是一种替代方法,它有助于识别安全化合物,同时降低与该过程相关的财务成本。Tox24挑战赛的目的是评估预测化学物质与转甲状腺素蛋白(TTR)结合活性的计算方法的进展。为了满足这项任务的要求,由美国环境保护局测量的数据集包含1512种性质各异的化学物质。本文描述了赢得Tox24挑战赛的模型及其进一步改进所采取的步骤。Transformer卷积神经网络(CNN)模型作为独立解决方案表现最佳。同时,基于图CNN构建的多任务模型,使用另外11个急性全身毒性数据集进行训练,并对TTR结合活性增加权重,在盲测集上显示出可比的结果。获胜方案是一个共识模型,由两个带有OEstate和Mold2描述符集的catBoost模型以及两个基于Transformer的模型组成。该解决方案的改进包括添加一个基于图CNN方法的多任务学习的第五个模型,这使得盲测集上的均方根误差(RMSE)降低到20.3%。获胜模型是使用OCHEM网络平台开发的,可在https://ochem.eu/article/162082在线获取。