Suppr超能文献

从平凡到令人惊讶的非加和性:驱动因素及其对 ML 模型的影响。

From mundane to surprising nonadditivity: drivers and impact on ML models.

机构信息

Roche Pharma Research and Early Development, Roche Innovation Center Basel, F. Hoffmann- La Roche AG, Basel, 4070, Switzerland.

出版信息

J Comput Aided Mol Des. 2024 Jul 25;38(1):26. doi: 10.1007/s10822-024-00566-0.

Abstract

Nonadditivity (NA) in Structure-Activity and Structure-Property Relationship (SAR) data is a rare but very information rich phenomenon. It can indicate conformational flexibility, structural rearrangements, and errors in assay results and structural assignment. While purely ligand-based conformational causes of NA are rather well understood and mundane, other factors are less so and cause surprising NA that has a huge influence on SAR analysis and ML model performance. We here report a systematic analysis across a wide range of properties (20 on-target biological activities and 4 physicochemical ADME-related properties) to understand the frequency of various different phenomena that may lead to NA. A set of novel descriptors were developed to characterize double transformation cycles and identify trends in NA. Double transformation cycles were classified into "surprising" and "mundane" categories, with the majority being classed as mundane. We also examined commonalities among surprising cycles, finding LogP differences to have the most significant impact on NA. A distinct behavior of NA for on-target sets compared to ADME sets was observed. Finally, we show that machine learning models struggle with highly nonadditive data, indicating that a better understanding of NA is an important future research direction.

摘要

非加和性(NA)在结构-活性和结构-性质关系(SAR)数据中是一种罕见但信息量丰富的现象。它可以指示构象灵活性、结构重排以及测定结果和结构分配中的错误。虽然基于配体的 NA 纯粹是构象原因,这已经被很好地理解了,但其他因素则不太清楚,导致令人惊讶的 NA,这对 SAR 分析和 ML 模型性能有巨大影响。我们在此报告了一项系统分析,涵盖了广泛的性质(20 种靶标生物活性和 4 种物理化学 ADME 相关性质),以了解可能导致 NA 的各种不同现象的频率。开发了一组新的描述符来描述双转化循环并识别 NA 中的趋势。双转化循环分为“令人惊讶”和“平凡”两类,其中大多数被归类为平凡。我们还研究了令人惊讶的循环之间的共性,发现 LogP 差异对 NA 的影响最大。观察到靶标集与 ADME 集的 NA 行为明显不同。最后,我们表明机器学习模型难以处理高度非加和数据,这表明更好地理解 NA 是一个重要的未来研究方向。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验