Pandey Sapna Kumari, Roy Kunal
Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700032, India.
Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700032, India.
Toxicology. 2023 Dec;500:153676. doi: 10.1016/j.tox.2023.153676. Epub 2023 Nov 21.
Mutagenicity is considered an important endpoint from the regulatory, environmental and medical points of view. Due to the wide number of compounds that may be of concern and the enormous expenses (in terms of time, money, and animals) associated with rodent mutagenicity bioassays, this endpoint is a major target for the development of alternative approaches for screening and prediction. The majority of old-aged expert systems and quantitative structure-activity relationship (QSAR) models may show reduced performance over time for their application on newer chemical candidates; thus, researchers constantly try to improve the modeling strategies. In our report, we initially performed traditional classification-based linear discriminant analysis (LDA) QSAR modeling using the benchmark Ames dataset of diverse chemicals (6512 compounds) to recognize the relationship between the molecules and their potential mutagenic behavior. The classical LDA QSAR model is developed from a selected set of 2D descriptors. The LDA QSAR model was developed by using a total of 31 descriptors identified from the analysis of the most discriminating features. Additionally, we have used similarity-derived features obtained from the read-across (RA) to develop an RA-based QSAR model. The developed RA-based LDA QSAR model has better predictivity, transferability, and interpretability compared to the LDA QSAR model, and it uses a very small number of descriptors compared to the classical QSAR model. Different machine learning (ML) models were also developed using the descriptors appearing in the read-across-based LDA QSAR model for comparative studies. We have checked the prediction quality of 216 true external set compounds using the novel similarity-derived RA model. The performance of the OECD toolbox is also compared with the RA-derived LDA QSAR model for a true external set. The current study aimed to explore the significance of the read-across-based algorithm and its application to the most current experimental mutagenicity data to complement already available expert systems.
从监管、环境和医学角度来看,致突变性被视为一个重要的终点。由于可能受到关注的化合物数量众多,以及与啮齿动物致突变性生物测定相关的巨大费用(在时间、金钱和动物方面),这个终点是开发筛选和预测替代方法的主要目标。大多数老旧的专家系统和定量构效关系(QSAR)模型随着时间的推移,在应用于新的化学候选物时可能表现出性能下降;因此,研究人员不断尝试改进建模策略。在我们的报告中,我们最初使用不同化学品的基准艾姆斯数据集(6512种化合物)进行基于传统分类的线性判别分析(LDA)QSAR建模,以识别分子与其潜在致突变行为之间的关系。经典的LDA QSAR模型是从一组选定的二维描述符开发而来的。通过对最具区分性特征的分析确定了总共31个描述符,从而开发了LDA QSAR模型。此外,我们还使用了从跨读(RA)获得的相似性衍生特征来开发基于RA的QSAR模型。与LDA QSAR模型相比,所开发的基于RA的LDA QSAR模型具有更好的预测性、可转移性和可解释性,并且与经典QSAR模型相比,它使用的描述符数量非常少。还使用基于跨读的LDA QSAR模型中出现的描述符开发了不同的机器学习(ML)模型进行比较研究。我们使用新型相似性衍生的RA模型检查了216种真实外部集化合物的预测质量。还将经合组织工具箱的性能与真实外部集的基于RA的LDA QSAR模型进行了比较。当前的研究旨在探索基于跨读的算法的重要性及其在最新实验致突变性数据中的应用,以补充现有的专家系统。