Bayer AG, Division CropScience, Alfred-Nobel-Str 50, Monheim 40789, Germany.
Chem Res Toxicol. 2024 Oct 21;37(10):1698-1711. doi: 10.1021/acs.chemrestox.4c00248. Epub 2024 Sep 20.
Inhibition of thyroid peroxidase (TPO) is a known molecular initiating event for thyroid hormone dysregulation and thyroid toxicity. Consequently, TPO is a critical off-target for the design of safer agrochemicals. To date, fewer than 500 structurally characterized TPO inhibitors are known, and the most comprehensive result set generated under identical conditions encompasses approximately 1000 compounds from a subset of the ToxCast compound collection. Here we describe a collaboration between wet lab and data scientists combining a large in vitro screen and the subsequent development of an in silico model for predicting TPO inhibition. The screen encompassed more than 100,000 diverse drug-like agrochemical compounds and yielded more than 6000 structurally novel TPO inhibitors. On this foundation, we applied different machine learning techniques and compared their performance. We discuss use cases for in silico TPO models in agrochemical research and explain that model recall is of particular importance when selecting compounds from large virtual compound collections. Furthermore, we show that due to the higher structural diversity of our training data, our final model allowed better generalization than models trained on the ToxCast data set. We now have a tool to predict TPO inhibition even for molecules that are only available virtually, such as hits from virtual screenings, or compounds under consideration for inclusion in our screening collection. Structures and activity data for 34,524 compounds are provided. This data set includes almost all inhibitors, including more than 3000 proprietary structures, and a large proportion of the inactives.
甲状腺过氧化物酶(TPO)的抑制是甲状腺激素失调和甲状腺毒性的已知分子起始事件。因此,TPO 是设计更安全的农用化学品的关键非靶标。迄今为止,已知的 TPO 抑制剂结构不到 500 种,在相同条件下生成的最全面的结果集包括来自 ToxCast 化合物集合的约 1000 种化合物。在这里,我们描述了湿实验室和数据科学家之间的合作,该合作结合了大规模的体外筛选和随后开发的用于预测 TPO 抑制的计算模型。该筛选涵盖了超过 10 万种不同的类药物农用化学品化合物,产生了超过 6000 种结构新颖的 TPO 抑制剂。在此基础上,我们应用了不同的机器学习技术并比较了它们的性能。我们讨论了在农用化学品研究中使用计算 TPO 模型的用例,并解释了当从大型虚拟化合物库中选择化合物时,模型召回尤其重要。此外,我们表明,由于我们的训练数据具有更高的结构多样性,我们的最终模型比基于 ToxCast 数据集训练的模型具有更好的泛化能力。我们现在有了一种工具,可以预测 TPO 抑制,即使对于仅在虚拟环境中可用的分子,例如虚拟筛选的命中化合物,或考虑包含在我们筛选库中的化合物。提供了 34524 种化合物的结构和活性数据。该数据集几乎包含所有抑制剂,包括 3000 多种专有结构,以及很大比例的非活性化合物。