Research Programme on Biomedical Informatics (GRIB), Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Hospital del Mar Medical Research Institute, 08003 Barcelona, Spain; Department of Pharmacy and Pharmaceutical Technology and Parasitology, Universitat de València, 46100 Valencia, Spain.
Research Programme on Biomedical Informatics (GRIB), Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Hospital del Mar Medical Research Institute, 08003 Barcelona, Spain.
Toxicol Lett. 2023 Nov 1;389:34-44. doi: 10.1016/j.toxlet.2023.10.013. Epub 2023 Oct 27.
New Approach Methodologies (NAMs) have ushered in a new era in the field of toxicology, aiming to replace animal testing. However, despite these advancements, they are not exempt from the inherent complexities associated with the study's endpoint. In this review, we have identified three major groups of complexities: mechanistic, chemical space, and methodological. The mechanistic complexity arises from interconnected biological processes within a network that are challenging to model in a single step. In the second group, chemical space complexity exhibits significant dissimilarity between compounds in the training and test series. The third group encompasses algorithmic and molecular descriptor limitations and typical class imbalance problems. To address these complexities, this work provides a guide to the usage of a combination of predictive Quantitative Structure-Activity Relationship (QSAR) models, known as metamodels. This combination of low-level models (LLMs) enables a more precise approach to the problem by focusing on different sub-mechanisms or sub-processes. For mechanistic complexity, multiple Molecular Initiating Events (MIEs) or levels of information are combined to form a mechanistic-based metamodel. Regarding the complexity arising from chemical space, two types of approaches were reviewed to construct a fragment-based chemical space metamodel: those with and without structure sharing. Metamodels with structure sharing utilize unsupervised strategies to identify data patterns and build low-level models for each cluster, which are then combined. For situations without structure sharing due to pharmaceutical industry intellectual property, the use of prediction sharing, and federated learning approaches have been reviewed. Lastly, to tackle methodological complexity, various algorithms are combined to overcome their limitations, diverse descriptors are employed to enhance problem definition and balanced dataset combinations are used to address class imbalance issues (methodological-based metamodels). Remarkably, metamodels consistently outperformed classical QSAR models across all cases, highlighting the importance of alternatives to classical QSAR models when faced with such complexities.
新型方法学(NAMs)在毒理学领域开创了新纪元,旨在取代动物测试。然而,尽管这些进展,它们也不能免除与研究终点相关的固有复杂性。在这篇综述中,我们确定了三大类复杂性:机制、化学空间和方法学。机制复杂性源于网络中相互关联的生物过程,在单个步骤中建模具有挑战性。在第二组中,化学空间复杂性表现为训练和测试系列化合物之间存在显著差异。第三组包括算法和分子描述符限制以及典型的类不平衡问题。为了解决这些复杂性,这项工作提供了使用组合预测定量构效关系(QSAR)模型的指南,称为元模型。这种低水平模型(LLMs)的组合通过专注于不同的子机制或子过程,为问题提供了更精确的方法。对于机制复杂性,多个分子起始事件(MIEs)或信息水平结合起来形成基于机制的元模型。对于化学空间引起的复杂性,综述了两种构建基于片段的化学空间元模型的方法:有和没有结构共享的方法。有结构共享的元模型利用无监督策略来识别数据模式,并为每个簇构建低水平模型,然后进行组合。对于由于制药行业知识产权而没有结构共享的情况,综述了预测共享和联邦学习方法的使用。最后,为了解决方法学复杂性,结合了各种算法来克服其局限性,使用了多种描述符来增强问题定义,并使用平衡数据集组合来解决类不平衡问题(基于方法学的元模型)。值得注意的是,元模型在所有情况下都始终优于经典 QSAR 模型,这突出了在面临这些复杂性时替代经典 QSAR 模型的重要性。